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Preface 


This is a guide to the analysis of spatial data. Spatially arranged measure- 
ments and spatial patterns occur in a surprisingly wide variety of scientific 
disciplines. The origins of human life link studies of the evolution of 
galaxies, the structure of biological cells, and settlement patterns in 
archeology. Ecologists study the interactions among plants and animals. 
Foresters and agriculturalists need to investigate plant competition and 
account for soil variations in their experiments. The estimation of rainfall 
and of ore and petroleum reserves is of prime economic importance. 
Rocks, metals, and tissue and blood cells are all studied at a microscopic 
level. The aim of this book is to bring together the abundance of recent 
research in many fields into the analysis of spatial data and to make 
practically available the methods made possible by the computer revolu- 
tion. 

The emphasis throughout is on looking at data. Each chapter is devoted 
to a particular class of problems and a data format. The two longest and 
most important are on smoothing and interpolation (producing contour 
maps, estimating rainfall or petroleum reserves) and on mapped point 
patterns (trees, towns, galaxies, birds’ nests). Shorter chapters cover: 


The regional variables of economic and human geography. 
Spatially arranged experiments. 

Quadrat counts. 

Sampling a spatially correlated variable. 

Sampling plants and animals and testing their patterns. 


The final chapter looks briefly at the use of image analyzers to investigate 
complex spatial patterns, and stereology: how to gain information on 
three-dimensional structures from linear or planar sections. Some emphasis 
is placed on going beyond simple tests to detect “nonrandom” patterns as 
well as on fitting explanatory models to data. Some general families of 
models are discussed, but the reader is urged to find or invent models that 
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reflect the theories of his or her own discipline, such as central place theory 
for town locations. The techniques presented are designed for both of John 
Tukey’s divisions of exploratory and confirmatory data analysis. 

The level of mathematical difficulty varies considerably. The formal 
prerequisites are few: matrix algebra, some probability and statistics, and 
basic topology in parts of Chapter 9. An acquaintance with time series 
analysis would be heipful, especially for Chapter 5. I have tried to confine 
the formal mathematics to the essential minimum. Mathematically able 
readers will be able to find their fill in the references. It is perhaps 
inevitable that some of the mathematical justifications are far deeper than 
is the practical import of the results. But beware. There is much appealing 
but incorrect mathematics in the spatial literature, and some of the subtlest 
arguments are used to discover undesirable properties of simple proce- 
dures. I recommend readers who find the going tough to skip ahead to the 
examples before seriously tackling the theory. 

Computers, especially computer graphics, are an essential tool in spatial 
statistics. Useful data sets are too large and most of the methods too 
tedious for hand calculation to be contemplated. Even data collection is 
being increasingly automated. The worked examples were analyzed at an 
interactive graphics terminal by FORTRAN programs running on Imperial 
College’s CDC 6500/Cyber 174 system. Unfortunately, the reader cannot 
follow my decisions as I rotated plots, investigated contour levels, and 
altered smoothing parameters. There is no substitute for experience at a 
computer terminal using one’s own data. Therefore, it was a difficult 
decision not to include programs. There was at the time of writing no 
agreed-upon standard for computer graphics, and the availability of plot- 
ting and other utility operations varied widely. The choice of language was 
also debatable. I could only use interactive graphics from FORTRAN, whereas 
microcomputers were becoming available with BASIC or PASCAL. Hints on 
algorithms and computation are included. 

The bibliography is the only example I know of that attempts a compre- 
hensive coverage of the spatial literature. It contains not only references to 
the theory and methods, but a large number of accounts of applications in 
many disciplines as well. Guides to the literature are given at the end of 
several chapters and sections. 


B. D. RIPLEY 


London 
March 1981 
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CHAPTER 1 


Introduction 


1.1 WHY SPATIAL STATISTICS? 


Men have been drawing maps and so studying spatial patterns for millenia, 
yet the need to reduce such information to numbers is rather recent. The 
human eye and brain form a marvelous mechanism with which to analyze 
and recognize patterns, yet they are subjective, likely to tire, and so to err. 
The explosion in computing power available to the average researcher now 
makes it possible to do routinely the intricate computations needed to 
explore complex spatial patterns. 

One sense of the word “statistics” is a collection of numbers, and spatial 
Statistics includes “spatial data analysis,” the reduction of spatial patterns 
to a few clear and useful summaries. But statistics goes beyond this into 
what John Tukey has called “confirmatory data analysis,” in which these 
summaries are compared with what might be expected from theories of 
how the pattern might have originated and developed. Consider, for 
example, Figure 1.1a, which is a map of trees in a rectangular plot. 
Figure 1.1b shows a summary of these data as a graph, together with 
confidence limits for the sort of graph we would get if each tree had been 
placed at random in the plot, without any reference to the positions of the 
rest of the trees. This example shows one of the characteristic features of 
the subject. There are so many different types of spatial patterns that we 
need to summarize the data in one or more graphs rather than by single 
numbers, such as the mean and standard deviation of classical statistics. 

Almost invariably we will have only a single example of a particular 
pattern rather than the many replications of measurements found in the 
experimental sciences. To get some idea of the variability of such data, 
we are forced to make some assumption of stationarity of the underlying 
mechanism that generated the pattern. Such an assumption has often 
been disputed, particularly in the geographic literature. Its validity may 
depend on the questions being asked. For instance, if we are looking at 
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Fig. 1.1 (a) Point patterns of trees. (6) Summary of data with a 95% confidence band. See 
Figure 8.6 for further details. 
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population density we may wish to know whether we need to invoke the 
topography (which might suggest nonstationarity) to explain the observed 
variations in density. Patterns that vary in a systematic way from place to 
place are called heterogeneous (opposite homogeneous). But we might be 
studying the grouping of houses that we might expect to interact, either 
clustering together because of human gregariousness or inhibiting where 
houses need to be close to sufficient land. Patterns can also exhibit 
preferred directions, called anisotropy (opposite isotropy). For example, 
forests that were originally planted in rows may show directionality in the 
crowns of the trees (Ford, 1976). We will assume that the data have been 
subdivided into sufficiently small units or that they have had obvious 
trends removed to permit us, where necessary, to invoke homogeneity or 
isotropy. 


1.2 TYPES OF DATA 


The basic subdivision of this volume is by the type of data to be analyzed. 
The tree positions given in Figure 1.la are an example of a point pattern. 
Other examples are the locations of birds’ nests, of imperfections in metals 
or rocks, galaxies, towns, and earthquakes. Of course, none of these is 
actually a point, but in each case the sizes of the objects are so small 
compared with the distances between them that their size may be ignored. 
(Sometimes size is an important explanatory variable associated with a 
point. For example, we might expect the area of the hinterland of a town 
to depend on its population size.) Maps of point patterns are discussed in 
Chapter 8. 

Sometimes points are so numerous that complete mapping would be an 
unjustified effort (consider clover plants in a grassland). Two methods of 
sampling such point patterns are discussed in Chapters 6 and 7. In 
Chapter 6 we consider methods based on taking sample areas, called 
quadrats, and counting objects within each, whereas in Chapter 7 the 
methods are based on measuring distances to or between objects. Chapter 
7 also deals with two cases in which complete mapping is either uneco- 
nomical or impossible; trees in a dense forest and animal populations such 
as deer and moorland grouse (game birds). 

Many variables that were originally point patterns are recorded as 
regional totals, such as census information. If these regions are genuinely 
distinct, we may wish to test for correlation between the regional statistics, 
taking account of the connections between the regions measured by, say, 
the lengths of the common borders (if any) or freight costs between them. 
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Fig. 1.2 (a), (5) A simulated surface. (c), (d) Two reconstructions from the sample points 
indicated as circles. 
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Fig. 1.2 (continued) 
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Summary measures for what is known as “spatial autocorrelation” are 
discussed in Section 5.4. They are particularly useful when applied to the 
residuals from the regression of one regional statistic on others. 

Where the regions are small administrative units we might wish to 
smooth the data to produce a map of population density, average income, 
or similar variable. This problem of reconstructing a surface from irregu- 
larly spaced sample points is common; all topographical maps are 
prepared from such data, as is rainfall information. Geologists, oil pros- 
pectors, and mining engineers all have to reconstruct facets of an under- 
ground pattern such as the volume and average grade of ore in various 
parts of a mine, using spatially arranged samples. Such problems are 
considered in Chapter 4. Figure 1.2 illustrates a surface and two recon- 
structions. 

Usually the locations of the sample points are fixed from other consider- 
ations, but in Chapter 3 we consider how sample points should be chosen 
to give the best estimate of the average level of a surface. 

Data arranged on a rectangular grid are not as common as might be 
expected by analogy with time-series theory. They seem to arise only 
from man’s experiments, either where he has deliberately sampled sys- 
tematically or from agricultural field trials in which a field has been 
divided into rectangular parcels. Clearly, we would expect neighboring 
plots to have similar fertility and hence that the yields would be spatially 
autocorrelated. We show in Chapter 5 how such data might be analyzed. 

The least explored class of patterns are those of two or more phases 
forming a mosaic. Patterns of vegetation provide two-dimensional exam- 
ples, but most of the interest is in three dimensions, in bone and tissue and 
rock grains and pores. Descriptions of patterns such as that shown in 
Figure 1.3 were facilitated by the invention of image analyzers during the 
1960s, these being scanning microscopes connected to computers to analyze 
the vast amounts of output. Stereology is the theory of reconstructing 
information on three-dimensional patterns from planar sections (see, for 
example, Figure 1.3) or linear probes. This area is the subject of Chapter 9. 

All the models of the mechanisms that might generate patterns described 
in the chapters for each type of data are stochastic processes. Chapter 2, 
on “basic stochastic processes,” gives an introduction to what is needed of 
the mathematical theory, to generic families of models, and to ways in 
which the computer can be used to experiment with models. 

Most of the theory and methods apply equally in two or three dimen- 
sions. Where formulas depend on the dimension, only the two- 
dimensional case is given unless otherwise stated. Planar data are by far 
the most common; all the examples are planar. 
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Fig. 1.3 Simplified pore space (black) in a section of smackover carbonate rock. 


Spatial Topics Omitted 


This volume concentrates on information on location, ignoring the con- 
cepts of shape and form reflected in the monographs of Grenander (1976, 
1978), Mandelbrot (1977), and Bookstein (1978). Some specialized topics 
omitted are on the spread of epidemics (Bartholomew, 1973; Mollison, 
1977) and percolation theory (Shante and Kilpatrick, 1971; Welsh, 1977; 
Smythe and Wierman, 1978). Each of these references is more concerned 
with mathematical modeling than with analyzing data. 

Little attention is given here to space-time problems. Many of the 
same methods can be used, but adequate data seem rare (earthquake 
occurrences being an exception). Often the best way to deal with space— 
time data is to compare the maps in successive time periods. Another 
generalization is to multitype problems in which the objects are of different 
types or where two or more patterns or surfaces are to be related. Again, 
the extension of many of the methods is simple. Whenever a pair of 
points is considered, take one from each of the two patterns or surfaces. 
If three or more surfaces or patterns are considered, take them in pairs. 
In general, the theory of multitype procedures is not satisfactory and there 
are few examples of its use. Pielou (1977) gives examples of some of the 
methods of “classical” statistics used on these problems. 
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CHAPTER 2 


Basic Stochastic Processes 


This chapter assumes a basic knowledge of probability theory and sets up 
some of the background of the models and methods used in later chapters. 
Section 2.4 is more mathematical and is not necessary for an understand- 
ing of the rest of the material (although its ideas are used in Sections 5.2 
and 8.4). 


2.1. DEFINITIONS 


A stochastic process is a collection of random variables {Z(1)|t€T} inde- 
xed by a set TJ. It has been usual to take T to be a subset of the real 
numbers, say {1,2,3---} or [0,00). However, we need more general 
index sets such as pairs of integers (labeling the plots in a field trial), the 
plane (labeling topographic heights), and rectangles in the plane (labeling 
counts of plants). The great distinction between these indices and those 
representing time is that the latter have an ordering. 

The Daniell-Kolmogorov theorem states that to specify a stochastic 
process all we have to do is to give the joint distributions of any finite 
subset {Z(t,),..., Z(t,)} in a consistent way, requiring 


P(Z(t,)€A,,i=1,...,m, Z(s) ER) =P(Z(t;)EA;,,i=1,...,m) 


Such a specification is called the distribution of the process. We avoid 
subtle mathematics by only considering a finite number of observations on 
a stochastic process (except for the differentiability properties in Section 
4.4). 

We say that the stochastic process is stationary under translations or 
homogeneous if the distribution is unchanged when the origin of the index 
set is translated. For this to make sense the index set has to be un- 
bounded; it has to be either all pairs of integers or the whole plane. If T 
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is the whole of the plane or three-dimensional space, we can also consider 
processes that are stationary under rotations about the origin, called 
isotropic. 'Womogeneous and isotropic processes are stationary under rigid 
motions. The philosophy behind these definitions is discussed in Chapter 
1. Note that they can, at most, be partially checked by, for example, 
splitting the study region into disjoint parts and checking their similarity. 


2.2 COVARIANCES AND SPECTRA 


The covariance C and correlation R between Z(s) and Z(t) for two points 
in T are defined by 


C(s,t)=E| {Z(s)—E(Z(s))} {Z(t) -E(Z(1))} | 


R(s,t)=C(s,1)/V{C(s,s)C(t,1)} 


Homogeneity implies that C and R depend only on the vector h from s to 
t, whereas with isotropy they depend only on d(s,1t). We will use the 
notation Cth) or C(r) for these reductions. Note that by symmetry 
C(h)=C(—h), but C((—A,,4,)) may differ from C((A,,h,)). We will 
usually plot C in the right half-plane; the other half-plane is found by a 
half-turn rotation. 

In general the distribution of a stochastic process is not completely 
determined by the mean m(s)=E[Z(s)] and covariance C(s,¢). This is 
the case for an important class of processes, the Gaussian processes defined 
by the property that all finite collections {Z(t,),..., Z(¢,)} are joint Nor- 
mal (that is, every linear combination has a Normal distribution). It is 
important to know which covariance functions can occur, for given m and 
C we can construct a Gaussian process via the Daniell-Kolmogorov 
theorem with that mean and covariance. The necessary and sufficient 
condition is that C should be nonnegative definite and symmetric, that is, 
that C(t, s)=C(s, t) and 


[Eaz0)| = p> DaajCli,, t,)>0 (2.1) 


for all n, a,,...,a,, t),-.-,¢, (Breiman, 1968, Chapter 11). We often ask 
that C be strictly positive definite when (2.1) must be nonzero unless all a, 
are zero. 


COVARIANCES AND SPECTRA tl 


This condition of nonnegative definiteness occurs elsewhere, thus 
enabling us to give examples of valid covariance functions. The charac- 
teristic function of a d-dimensional symmetric random vector X is a non- 
negative definite continuous symmetric function on R%. A continuous 
homogeneous covariance function of a stochastic process on R? will be 
proportional to such a characteristic function. If X is rotationally sym- 
metric, the covariance is isotropic. Taking the d-dimensional Cauchy and 
Normal distributions (with densities proportional to 1/(1+a||x||?) and 
exp —a||x||7) shows that e~% and e ~ar? are both isotropic covariances in 
any number of dimensions. The spectral density f is defined by 


1 


(27)* 


fle)= f exp{ —iw™h}C(h) dh (2.2) 


when this integral exists. Then 


C(h) = f exp{ tie" h} f(w) deo (2.3) 


Thus f/C(0) is the pdf of a random vector with characteristic function 
C/C(0). For processes on a lattice (2.2) is replaced by a sum and only 
frequencies for which each component is in the range [—7, 7] are consid- 
ered, so the integration in (2.3) is restricted to[—7,7]?. Any nonnegative 
function that gives a finite value of C(O) in (2.3) is a spectral density. 

The spectral density inherits the symmetry condition f(—@)=f(w) from 
the covariance function. If the covariance is isotropic f(w) becomes a 
function of r= ||@|| only, and we have 


Hr)= 52 [doer )C(r rar 
C(r)=2m fdr fn)rde 


in R?, where J, is the Bessel function (Quenouille, 1949). 

The requirement of isotropy on a covariance function is quite restrictive. 
Schoenberg (1938) conjectured that such a function was continuous, except 
possibly at the origin. Furthermore, Matérn (1960, pp. 13—19) shows that 


C(r)> inf {k1(2/u)*J,(u)}C(0) k= (d—-2)/2 


so that isotropic correlations are bounded below by —0.403 in R? and 
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—0.218 in R*. Isotropic correlation functions are usually specified either 
by giving an isotropic spectral density or by “mixing” the family e~°. 
Suppose we choose a from some distribution, then use a process with 
correlation function e~*”. The correlation function of the mixed process 
is E(e~°"). This argument shows that any Laplace transform can be a 
covariance function. (This is the class of functions with (— 1)"C(r) 20 
for n=0,1,2,..., and all r>0). One such family of functions are those 
proportional to r’K,(ar) for »>0, with spectral densities proportional to 
1/(b+||@|/?)’**/*. Here K, is a Bessel function. The exponential corre- 
lation function is the special case pai (Whittle, 1954, 1956, 1963a). 

The class of known examples of isotropic correlation functions is not 
totally satisfactory, for in practice one often finds 


var{ f 200) ax| ~const(area(4)}*""/? (2.4) 
A 


A famous example is given by Fairfield Smith (1938), who found A~3/2 
for yields from wheat trials. Whittle (1956) showed that (2.4) is equivalent 
to C(r) and f(r) behaving as r~* and +“~® for large r and small r, 
whereas all the standard examples of isotropic covariances decay exponen- 
tially at large distances. 

One way to form an isotropic process in R* is to take a homogeneous 
process Z, with covariance function C, on R, to let Z(x)=Z,(x,) and then 
give the whole of each realization an independent uniformly distributed 
rotation about the origin in R*. Then the covariance function of Z is 


2T(d/2) 


r)= r)(1—v? 49? ; 
ai) J Cilerya= eS do (2.5) 


{Matheron, 1973, equation (4.1)]. For d=3 we have the simple results 
1 d 
C(r)= f Cord,  C(r)= 2 [rc(r)] (2.6) 
ft) r 


In fact (2.5) is the general form of an isotropic covariance in R*. It can 
be re-expressed as 


C(r)=E{C(rV)} (2.7) 
where V is the first coordinate of an independent uniformly distributed 


point on the surface of the unit ball in R“. We have noted that C/C(0) is 
the characteristic function of a random vector X. Because C is isotropic 
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X has a rotationally symmetric distribution and 
C(r)=C(0) E(exp{i|[X||V7}) (2.8) 


Comparison of (2.7) and (2.8) shows that we can take C,(t)= 
C(0) E(exp{it|{X|{}), which is nonnegative definite and symmetric and so a 
covariance function in R'. 

The inversion of (2.5) to find C, from C provides a way to simulate these 
processes, as discussed in Section 2.5. 

Whittle (1954, 1956), Heine (1955), and Bartlett (1975) discuss the 
definition of continuous stochastic processes via differential equations. 
Stochastic differential equations theory is needed to justify their manipula- 
tions, which lead to explanatory models for some of the covariances 
studied here. 
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Point patterns convey a different sort of spatial information from those 
processes considered so far. They can be included by defining Z(x)=1 if 
there is a point at x,0 otherwise. This representation is useless, however, 
for P(Z(x)=1) is usually zero and the distribution of the process then 
contains no information at all. We overcome this problem by indexing 
the stochastic process not by points but by sets, so Z(A) is the number of 
points in set A. Every realization of a point process is then a countable 
set of points, of which a finite number fall within any bounded set. The 
points can certainly be located by knowing the counts in each rectangle. 
In fact, it is sufficient to know which rectangles are nonempty. 

The basic point process is a Poisson process, defined by either or both of 
the following properties: 


1. The number of points in any set A has a Poisson distribution mean 
ACA). 
2. Counts in disjoint sets are independent. 


Here A is a measure giving finite mass to bounded sets, called the mean 
measure. It is often defined by A(A)=/,A(x)a@x for some nonnegative 
bounded function A(x). A homogeneous Poisson process has mean mea- 
sure A(A)=A area (A), where A is a constant known as the intensity, the 
expected number of points per unit area. Note that a homogeneous 
Poisson process is automatically isotropic. 

The Poisson process can also be defined in a more constructive way. 
Consider the process on a bounded set E. By property 2 it is sufficient to 
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construct the process independently on each of a partition of bounded sets. 
A binomial process on E is defined by taking N independent identically 
distributed points in E. To form a Poisson process we take N to have a 
Poisson distribution with mean A(£) and then take a binomial process 
with N points on E; each point having probability density function 
A(x) /A(E). 

Poisson processes are very convenient building blocks from which to 
generate other point processes, some examples of which are given in 
Chapters 6 and 8. They can be regarded as the analogue of independent 
observations and are often called “random” outside mathematics. 
(Mathematicians call defining property 2 “purely random” or “completely 
random.”) 

The formal mathematics of point processes is given in Kallenberg (1976) 
and Matthes et al. (1978). The volume edited by Lewis (1972) contains 
some elementary introductions and almost exclusively one-dimensional 
applications. 


2.4 GIBBS PROCESSES 


Gibbs processes are borrowed from statistical physics and provide a way to 
build new processes from old. We start with a base process P, and then 
define a new process P by giving the probability density @ of P with 
respect to Py. This density (formally the Radon-—Nikodym derivative 
dP/dP,) measures the amount by which a particular realization is “more 
likely” under the new distribution. We can, for instance, prohibit a whole 
class of realizations by setting ¢=0 on this class. 

Unfortunately, @ is usually known only up to a normalizing factor that 
must be chosen to make the new distribution a genuine probability with 
total mass one. For example, we can define a point process in which no 
pair of points is closer than D (a model for the centers of nonoverlapping 
disks of diameter D) by excluding all realizations of a Poisson process in 
which two points are closer than D. Then, ¢ for the remaining realiza- 
tions is the reciprocal of the probability that overlap does not occur, and 
this probability is impossible to compute analytically. 

The second problem illustrated by this example is that usually we must 
confine attention to a bounded region, since in the whole plane the 
probability of no overlap is zero. Observations are always on a bounded 
region, so this is not too serious a problem for us. Such processes cannot 
be truly homogeneous and isotropic, but we can usually arrange for ¢ to be 
independent of the origin and orientation of the coordinate axes. In 
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Statistical physics it is important to be able to define Gibbs processes on 
the whole of R°®. Two equivalent procedures are (1) to define the process 
on a bounded region (with suitable boundary conditions) and let this 
region expand, and (2) to consider the conditional distribution in each 
bounded region, conditioning on the points in the rest of space. The 
problem and interest in these approaches is whether they define a unique 
process. Preston (1974, 1976) and Ruelle (1969, 1970) consider these ideas 
in great depth. 

Markov processes have proved extremely useful in one dimension and 
several attempts have been made to define analogues on arbitrary spaces 
including the plane. They all seem less successful because the order 
property of the line (which means that a single point partitions time into 
past and future) is fundamental to the theory of Markov processes. 
Instead of past and future we need a rule that tells us for each pair of 
points whether or not they are neighbors. For observations on a lattice 
such rules are usually obvious. For points in a plane it is usual to define a 
pair to be neighbors if they are closer than a given distance. Define the 
environment E(A) of a set A to be the set of neighbors of points in A. 
We call a process Markov if the conditional distribution on A given the 
rest of the process depends only on the process in E(A)\A. (This gives 
the conventional definition for one-dimensional processes in discrete time 
but a different set of processes in continuous time.) 

Two important classes of examples are Gibbs processes with base 
process independent observations on sites of a lattice, and those with base 
process a Poisson process on a bounded region. In both cases the 
base process is Markov, and the Gibbs process is Markov if and only if 


(x)= IT yy) (2.9) 


ycx 


and y¥(y)= 1, unless each pair of points in y is a pair of neighbors. For a 
point process x in (2.9) is the collection of points, whereas for a lattice with 
n sites, x is the observations at all n sites, and yCx is a subset of the 
observations. The lattice case is the so-called Hammersley-— Clifford theo- 
rem and is much proved. Speed (1978) sketches a neat algebraic treat- 
ment. The point process case is due to Ripley and Kelly (1977) whose 
proof is in the same spirit as Speed’s. Equation (2.9) is often re-expressed 
as 


$(x)=exp 2 V(y) 


ycx 
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For a system of particles governed by forces V(y) will be the potential of 
that configuration of particles. For a Markov process V(y)=0 unless all 
particles in y are neighbors. Typically, V(y)=0 whenever y contains three 
or more particles. Then 


(x) =A exp 2 {e+ 2 V({E,n}) (2.10) 


»VeEx 


For an invariant ¢ we take V({&}) constant and V({&, n}) some function of 
d(é, 7). Functions ¢ of the form (2.10) then define pair-potential processes 
with 


(x) =Ab*™ TT] h(d(x,, x;)) (2.11) 


i<j 


# (x) =number of points in realization x= {x;}. 


2.55 MONTE CARLO METHODS AND SIMULATION 


We have stressed constructive definitions of stochastic processes because 
we do want to be able to construct realizations, a process known as 
simulation. It is a very intuitive idea that we should be able to assess the 
fit of a stochastic process to our data by making the computer simulate the 
process and to compare the outcomes with the data. Suppose we are 
interested in the distribution of a statistic T, which may be unavailable 
analytically or have an asymptotic or approximate answer the validity of 
which is unknown. We can simulate the process m times, calculate the 
statistic on each simulation, and then compare the empirical distribution 
with our theoretical work or build up a table of percentage points from the 
empirical distribution. 

Barnard (1963) took this idea a stage further. If the data were actually 
generated by our hypothetical model, the statistic from the data would be 
just another simulation, and the probability that it is the rth most extreme 
isr/(m+1)._ By a suitable choice of r and m we can obtain a significance 
test of any desired size of the null hypothesis that the data were generated 
by the simulated stochastic process. One-sided or two-sided tests can be 
achieved by suitably defining “extreme.” If the distribution of the test 
statistic is not continuous, the values of the statistic may tie. In theory, 
the tied values should be ordered at random. In practice, this will usually 
be ignored. 

The critical region used in Barnard’s test procedure is based on quantiles 
of the empirical distribution of T. Because this is random, we might 
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expect the test to be of lower power than that based on the exact 
distribution of T were this to be available. The loss in power was 
investigated by Hope (1968) and Marriott (1979). Their conclusions sug- 
gest that r>5 provides a sensible test with little loss in power. 

Monte Carlo methods (when distinguished from simulations) usually 
involve tricks to increase the “value” of m simulations. Ideas include 
conditioning on variables and finding the conditional distribution analyti- 
cally, intentionally using positively or negatively correlated simulations, 
and simulating some related processes. A problem with using simulation 
to find empirical distributions is that one is usually interested in the tails of 
the distribution and most of the simulations provide very little relevant 
information. One possible application of this set of ideas would be to 
simulate point processes conditional on the total number N of points and 
to choose N from a distribution other than its true one to make extreme 
values of the test statistic more probable. Most of the knowledge about 
Monte Carlo methods is widely dispersed or even unpublished and uses in 
spatial statistics are almost nonexistent. Hammersley and Handscomb 
(1964) give some of the earlier ideas. 

Of course, we do have to find efficient ways to simulate our processes. 
There is a general technique for Gaussian processes. Suppose §),...,&, 
are independent M(0,1) random variables. Then the covariance matrix of 
Pf is PP’. Thus to generate joint Normal random variables Lisiy Ly 
with means m,,...,m, and covariance matrix 2 we take a matrix P with 
PP’=% and let Z,=m,;+ Zp,,§;. The simplest way to form P numeri- 
cally is to note the Cholesky decomposition, which states that there is a 
lower triangular matrix L with LL7?=S. [Chambers (1977) is a good 
reference for such numerical methods and for sources of subroutines.] Of 
course, if a matrix P can be found analytically this should be used. If 
=~! is known, the Cholesky method is useful, for inversion of L is trivial 
numerically. 

Equations (2.5) and (2.6) form the basis of what Matheron (1973) called 
the “turning band” method to simulate homogeneous isotropic Gaussian 
processes in R? or R®. Simulate a process Z, on R with covariance C, 
and form Z as described above equation (2.5). Realizations of Z are 
rather strange (and not joint Normal); hence it is advisable to take m 
independent copies of Z and form their sum divided by Vm. This 
process has the required covariance C. A modification is to take eight 
independent copies of Z, and form 


8 
Z(x)= 2) Z[ (Vx), | 
1 


where Z; are independent processes with covariances C,/8 and V, is a 
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rotation of 27j/8. This modification makes it particularly easy to simu- 
late the values of Z on a finely spaced lattice in the plane. 

A Poisson process is most easily simulated via its constructive definition. 
Efficient algorithms for sampling Poisson random variables are discussed 
by Atkinson (1979a,b). Rejection sampling provides a general scheme 
whereby the points of a binomial process can be found. Suppose we have 
a supply Y,,¥Y5,..., of random variables with probability density g. To 
generate samples with density h, for each Y, generate an independent 
random variable U, on [0,1], and accept Y, if MU, <A(Y,)/g(Y,); otherwise, 
try Y,,,- Here M is an upper bound for A/g. For example, to find 
uniformly distributed points in some irregularly shaped region D, enclose 
D in [a,b]x[c,d] and generate points as (a+(b—a)U,,c+(d—c)U,), 
where U,,U, are independent uniform (0,1) random variables, and accept 
those points that fall within D. 

Rejection sampling can also be applied to whole realizations and so can 
be used to simulate Gibbs processes with a bounded ¢. Note that since M 
need only be an upper bound for ¢, we do not need to know the awkward 
normalizing constant A of (2.18). Unfortunately, ¢ is almost always too 
variable for rejection sampling to be a feasible method and tricks have to 
be used. Ripley (1979a) gives one trick for Gibbs point processes. Hast- 
ings (1970) and Peskun (1973) discuss a family of alternatives to rejection 
sampling well adapted to Gibbs processes. 
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CHAPTER 3 


Spatial Sampling 


This chapter deals with one specific sampling problem. There is a con- 
tinuous surface Z(x) for which we need the mean value within a region A; 
we are at liberty to choose n points anywhere within A, and then measure 
Z at those points. Other sampling problems are discussed in Chapters 6, 
7, and 9. Leta be the area (in R?) or volume (in R?) of A. We will use 


z=-13.2(%,) to estimate 2(A)= [ Z(x)dx/a (3.1) 
n& A 


Assuming a continuous surface ensures that the integral in (3.1) makes 
sense. We assume that Z is a realization of a homogeneous stochastic 
process with covariance function C and spectral density f. Thus we are 
taking a superpopulation view of sampling and will be taking expectations 
both over any randomization involved in sampling and over the surface Z. 


3.1 SAMPLING SCHEMES 


There are several standard schemes for arranging the n sample points 
{x,,...,X,}. Figure 3.1 illustrates the possibilities. 

Figure 3.1a,c has random elements, whereas Figure 3.1), d is systematic. 
Uniform random sampling, illustrated in Figure 3.la operates by choosing 
each point independently uniformly within A. Stratified random sampling 
takes a uniform random sample of size k from each of m strata or 
subareas, son=km. Figure 3.1c illustrates the usual case of square strata. 
(Although by no means necessary, square strata are almost universal.) 
Systematic samples can be of many types. Figure 3.15 illustrates the 
usual case of an aligned centric systematic sample. The only randomiza- 
tion that could be applied to such a sample would be to drop the “centric” 
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Fig. 3.1 Four spatial sampling schemes for 25 sample points. (a) Uniform random. (6) 
Centric systematic. (c) Stratified random. (d) Nonaligned systematic. 
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Fig. 3.1. (continued) 
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and to randomize over the starting position. Figure 3.1d illustrates another 
type of random sample. The position of the (i, /)th sample is (((i— 1)+ 
a,J4,{(j—1)+8,]A), where A is the spacing and (a;) and (B;) are precho- 
sen constants between zero and one. To evaluate these sampling schemes 
we will calculate nvar[Z—Z(A)}. 


3.2. ERROR VARIANCES 
Let p= E(Z(x)) and o? = var{ Z(x)] 
Uniform Random Sampling 


We take expectations first over the random positioning of the points 
Kesey Mp 


E{Z-2Z(A)|Z} “734 J Zo) dx~= f Z@x)dx=0 


var{ Z~Z(A)|Z} = = > var{Z(x,)} by independence 
n 


tl 
oi 


var[ Z(x,)]= al [ Z(x)-Z( A)? dx 


na 


ny 
a 


M 


J, [Z(x)—n Pax 


> 


— ff [ze0~n)(Z0)- 1] dx dy 


var{Z—2(A)} =E[ var{ Z—2(4)|Z} ]=-[o? -£(C(%Y)}] (3.2) 


where X,Y are independent uniformly distributed points in A. 
For A large compared with the effective range of the covariance func- 
tion the second term will be negligible. 


Stratified Random Sampling 


Suppose A is partitioned into m strata S),...,S,, each of area s. First 
averaging over the random sampling 


e(Z\z)=#(2 52Z| z)-2 D4s)=H4) 


ERROR VARIANCES 23 


SS [ (Z00)~ ny ax 


var{ Z—Z(A)|Z} = 2 2 is 


— ff (2)—n)(Zy)—-n) dx dy 
AYaany 


é é 


using the results above for the averages Zines for each stratum. 
Taking expectations over the process 


var(Z—Z(A))= | var{Z—Z(A)|Z} | 


=D [0?-£(cX%,.¥)}] 


— 


=—|0?-E{(C(X,,¥,)} | (3.3) 


n 

where now X,,Y, are random points in the ith stratum. The point of 
stratified sampling is that we should choose strata sufficiently small to 
make E{C(X, Y)} as large as possible, so that (3.3) is less than (3.2). 


Systematic Sampling 


Let the sample be at points labeled by uG A. Then 
= 1 1 
E(Z)= 2 > E(Z(u)) = . De=p 
u u 


2 
var( Z— Z(A)) )=54) 2 E (2-2) | 


Ect ~ 5 SEZ) p)(Z(A)—n)} 
+E{Z(A)-p} 
== DCuy)-25 aa f Cluy)ay 


+f f couyaxdy (3.4) 


To make further progress we assume there is a (r Xs) grid of points, of step 
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size A in both directions, and use stationarity with C(x,y)=c{x—y} to find 


nvar(Z—Z(A))~ =ZE(I- ll\( - 2) e(acu, 0)} 


re PL) Ll c((x,»)) dea 


the approximation involving neglecting the difference between r and r+1, 
sands+1. Again approximating by assuming that the range of c is small 
compared with rA and sA we find 


nvat{Z—Z(A)} = > c{A(u, v)} 


u, v integers 
ae ie c{(x, y)} dx dy (3.5) 


which in terms of the spectral density / is 


nvar(Z-Z(4)} =A = AAP P)} 10.0 (3.6) 


integers 


Several conclusions can be drawn from (3.2), (3.3), and (3.6). For an 
isotropic covariance C(X, Y)=c(||X— Y||) so 


E(C(X,Y)) = f° e(r)baC7) ar 


where b,(r) is the density function of the distribution of the distance 
between two points uniformly distributed in A. Clearly, this is largest for 
compact regions such as circles or squares. This is relevant to the choice 
of stratum shape. If we assume all strata are congruent to S, (3.3) 
becomes 


nvar( Z-2(A))=0° 1 f° R(rbg(r) a (3.7) 


The gain from stratification will be most when R(r) is large for all r up to 
the diameter of S, but becomes negligible on the scale of S. This suggests 
that for monotonically decreasing correlation functions we should take 
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small strata; hence k will be small. For a square stratum of side A (3.3) 
becomes 


4 


i- Duy? 


= f f((u,»)) 


(1—cos #A)(1 —cos 2) dudv (3.8) 


Compare (3.8) with (3.6). For frequencies at which (uA) and (vA) are 
both small, the second factor in (3.8) is near 1, so both formulas are 
approximations to {f(w)dw except that both reduce the weights given to 
low frequencies, more sharply for (3.6) than for (3.8). Thus, if low 
frequencies are dominant (corresponding to strong local positive correla- 
tion), both stratified random and systematic sampling should do well 
relative to uniform random sampling. Furthermore, we would expect 
systematic sampling to be best unless there is a sharp peak in the spectral 
density at one of the frequencies summed in (3.6). In practice, this would 
only occur if the process has a strong periodicity with wavelength the basic 
sampling interval A along either axis or with wavelength V2 A along a 
diagonal. 


3.3. ESTIMATING SAMPLING ERRORS 


For uniform random sampling we would compute the sample variance 
s =]2 
=> | 2(%,)-Z}/(n-1) (3.9) 
1 
Taking expectations over the randomization we get 


am' f [2(x)- ZA) J dx 
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which gives us 
o* —E(C(X,Y)) 


when we average over the process. From (3.2) s?/n is an unbiased 
estimator of the sampling error variance. 

For stratified random sampling with k, the number of points per 
stratum, at least 2 we can apply (3.9) to each stratum. The average within 
stratum variance is then an unbiased estimator of nvar(Z —Z(A)). How- 
ever, we have seen that the latter is smallest when the strata are chosen as 
small and compact as possible. Ideally, we would choose n strata with 
k=1. In this case, or with systematic sampling, we have no information 
left with which to estimate the sampling error. 

Clearly, with systematic sampling we will never be able to assess the 
variability due to the positioning of the grid, so we must assume that this is 
small; that is, that there is no periodicity in the process at the sampling 
wavelength. Finney (1948, 1950, 1953) has warned of this problem and 
presented an example of apparent periodicity in a forest survey. Matérn 
(1960) pointed out that this could be pseudo-periodicity as described in 
time series by Yule. However, (3.6) makes clear that either true or 
pseudo-periodicity is very detrimental to the precision of a systematic 
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Fig. 3.2 Artificial strata for the estimation of the sampling variance with a 10x 10 centric 
systematic sample. 
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sample. Milne (1959) found inconsistencies in Finney’s example and 
suggested that the periodicity is, in fact, caused by defects in the sampling, 
for the forest was assigned to the group of enumerators in a periodic 
manner. Even if there were an unsuspected periodicity in the process 
being sampled, Milne concludes that “the danger to centric systematic 
sampling from unsuspected periodic variation is so small as to be scarcely 
worth a thought.” 

There remains the problem of how to estimate the sampling variance in 
systematic samples or with stratified random sampling with one sample per 
stratum. Two possibilities are to use formula (3.9) and regard s?/n as the 
error variance or to impose larger strata on the sample, as illustrated in 
Figure 3.2, and to use the stratified sampling formula formed by averaging 
(3.9) over each of these artificial strata and dividing by n. Milne has an 
empirical study that suggests that either method gives a good idea of the 
true sampling variance. Matérn (1960) shows theoretically that this pro- 
cedure may (slightly) overestimate the sampling variance. 


3.4 OPTIMAL LOCATION OF SAMPLES 


Dalenius et al. (1961) consider the optimal location of sampling points for 
Zubrzycki’s process (Section 4.4). They show that a sampling scheme 
should give as high a degree of overlap as possible between disks centered 
on the sample points with radius R, that defining the process. Clearly, 
this needs a systematic sample. They show that the optimal schemes (in 
the sense of minimum achievable sampling variance per sample point) are 
various triangular, rectangular, and hexagonal lattices, the choice depend- 
ing on the number of samples per unit area and R. They conjecture that 
an equilateral triangular lattice is optimal for all exponential covariance 
functions and hence (by mixing) for all completely monotonic isotropic 
covariances. 
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CHAPTER 4 


Smoothing and Interpolation 


Throughout this chapter we assume that we are given observations Z(x,), 
and we wish to map the surface Z within a region D. The sample points 
X,,--.,Xy are usually, but not always, inside D. They might be on a 
regular grid, as suggested by the theory of Chapter 3, or they might be 
those points at which data is available, chosen for other reasons. For 
instance, networks of rain gauges are set up where observers are available, 
and data on oil and mineral fields are available where drilling occurred (at 
spots thought to be fruitful) and from those parts of the field that are being 
exploited. Under these circumstances, the sample mean may be seriously 
biased as an estimator of the mean level of the surface within D. An 
alternative is to fit an interpolating surface to the data values and to find 
its average. Such a surface can have independent uses. In mining, a 
map of the mineral grade (for example, percentage of copper) will help 
plan the mining operation as well as give information on which parcels will 
have a high enough average grade to make processing economic. It may 
also be helpful to have a smoothed map to indicate the broad features of 
the data as well as an interpolated map for prediction. Indeed, if the data 
are themselves from samples as in mining or soil surveys, they may be 
thought to have an appreciable measurement error, possibly making inter- 
polation undesirable. 

Often field workers will have extra, qualitative, information on the 
surface to be analyzed, or such data may also be available by analogy, for 
instance, from similar mines that have been worked out. It is important 
to have some way of introducing this information. Some of the methods 
to be described will produce only one map from a given set of observa- 
tions, whereas others have several parameters that may be “fine tuned.” 
These are used most easily at an interactive graphics terminal, so that the 
visual effect can be gauged. 

Surfaces over a two-dimensional domain can be represented on a plotter 
or in print in one of two ways. Both are illustrated in Figure 1.2. Most 
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users find it easiest to grasp the broad features from perspective plots and 
fine details from contour maps. With perspective plots it is often im- 
portant to pick the best view, and even with contour plots it is useful to 
have available interactively the facility to add lines at critical contour 
levels and to explore parts of the map in detail. Unfortunately these 
opportunities are denied to the reader, who has to accept my choices. 
The contouring algorithm used is discussed in Section 4.5. 

The first four sections discuss four conceptual approaches. Trend 
surfaces are the generalizations to more than one dimension of curve-fitting 
by least squares. The spatial analogues of spline interpolation are dis- 
cussed in Section 4.3. The other two methods are extensions of those 
used to forecast time series: moving averages and the Wiener~ Kolmogorov 
theory (often regarded as the “Box-—Jenkins method”). All methods 
are discussed in terms of two examples and compared at the end of 
Section 4.4. 


4.1 TREND SURFACES 


Multidimensional generalizations of polynomial regression go back to 
“Student” in 1914; however, they were popularized in the earth sciences by 
Grant (1957) and Krumbein (1956), about which time computers made the 
computational task less daunting. This is a smoothing technique—the 
idea is to fit to the data by least squares a function of the form 


Ky)= Z 4,,x'y' (4.1) 


r+s<p 


of which the first few functions are: 


a flat 
at+bx+cy linear 
a+bx+cy+dx*+exy+fy? quadratic 


These formulas cover two dimensions. They have obvious extensions to 
three or more dimensions. The integer p is the order of the trend surface. 
There are P=(p+1)(p+2)/2 coefficients, which are normally chosen to 
minimize 


N 
= {Z(x,)-f(x,)}’ (4.2) 
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Linear, quadratic, and cubic fits to 10 points are illustrated in Figure 4.1. 
Notice that a cubic surface has 10 parameters and so provides an exact fit 
to the data. An undesirable feature of trend surfaces is shown, a tend- 
ency to wave the edges to fit points in the center. This effect is well 
known in polynomial regression, but is more severe in two or more 
dimensions where there is more boundary to be affected. 

We can rewrite (4.1) to make (4.2) a standard least-squares or multiple 
regression problem. If we label 1, x, y, x7, xy, y?,..., aS f(x)... fp(x) 
and the coefficients as £,,...,Bp, the problem is multiple regression of 
Z(x;) on (f,(x;),---, fp(x;)) and could be given to a statistical package. 
Now polynomial regression is well known as an ill-conditioned least-squares 
problem that needs careful numerical analysis, and this is equally true of 
polynomial trend surfaces. It is desirable to rescale D to about a square 
with sides [—-1,+1] to avoid extremely large or small values in /,(x). 
Orthogonal polynomials are often used for polynomial regressions, espe- 
cially with regularly spaced data. This approach has been extended to 
trend surfaces by Grant (1957) and Whitten (1970, 1972, 1974a). Modern 
multiple regression techniques (for which see Chambers, 1977 or Seber, 
1976) rely on orthogonalizing the functions f,,..., fp and make the orthog- 
onal—polynomial technique unnecessary. Suppose we write the surface as 


Aix) 
Z(x)=f(x)"Bte(x), A(x)=] (4.3) 
fp(x) 
and 
f(x, ‘a Z(x,) 
Zy=FBte, F= : fs Zep 
f(x)” Z(xy) 


These algorithms find an orthogonal matrix Q (Q'0=QQ7=1) and an 
upper triangular matrix R so that 


or=R-| | Oinaicky eee Reee (4.4) 


OR 


). (a), (8) 


d). (c), (2) Quadratic trend surface. 


ig. 4.1 (continue 
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Then the least-squares estimate of B is found by 
= 2 
T 
> (Z(x,)-£(x,)"B) =(Z, — FB) (Zy — FB) 
! 


=[Q(Z, —FB)]"[ O(Zy — FB)] 
=[QZ, ~ RB]"[ QZ, —RB] 
=¥J Y, +(Y¥, —RB)’(Y, — RB) (4.5) 


where 
Y, 
QZ, = y, |’ (Yi) px (Y2 cv Pyx1 
2 


Provided R has full rank, we obviously get a minimum sum of squares 
when Y, = RB, 


B=R-'Y, (4.6) 


Since R is upper triangular, it is easy to solve (4.6) for (Bp, Bp_,,-.., B,). 
The rank of R is that of F, so our condition is that f\,..., fp be linearly 
independent when evaluated at (x,,...,x,) and principally excludes all 
data points lying on a line. We will assume that it is satisfied. 

The usual statistical model associated with (4.3) assumes that the “er- 
rors” e(x) are independent Normally distributed random variables with zero 
mean and variance o? for all x. This is rather unrealistic, for we would 
expect deviations from a trend surface to be positively correlated over 
short distances. We will find better assumptions in Section 4.4, but if for 
the moment we accept this distribution, we find that the covariance matrix 
of the B’s is 02(F7F)~!=o2(R™R)~'. Thus the variance of f(x)"B, the 
value of the surface at the point x, is 


a? || R~7K(x)|)? (4.7) 


where elle is the squared length of a vector and R~? is shorthand for 
(R7)~'=(R7')?. From this model we can perform the usual F tests of 
multiple regression to see whether to increase or decrease the order of the 
trend surface. But if the “errors” are positively correlated, they jointly 
carry less information than the theory suggests and we would usually be 
led to fit a surface of too high an order. 
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There has been much discussion on the effect of the pattern of the data 
points on the fitted trend surface. Davis (1973, pp. 349-352) illustrates 
some experiments with samples from polynomial surfaces that suggest that 
whereas it is important to have data points throughout D, it does not 
matter much if they are clustered. Gray (1972) and Robinson (1972) 
concur. It does seem intuitively obvious that when the fit of the surface is 
not good, clusters of points are equivalent to weighting that part of domain 
D near the clusters to have a good fit. An uneven distribution of clusters 
can distort the fitted surface. 

The astute reader may wonder why we chose only the full linear and 
quadratic surfaces rather than introduce the second-order terms one at a 
time. The fitted surface should be invariant under rigid motions, since 
the coordinate scheme chosen is purely a convenience and the fitted 
surface should not depend on it. Cliff and Kelly (1977) consider invariant 
trend surfaces and find some invariant combinations of the parameters, 
but only the full surfaces are invariant. If combinations of terms are 
allowed there are a few other invariant surfaces, for instance 


atbx+cyt+d(x* +y?) 


These exceptional invariant surfaces are all of even order and have the 
terms of highest degree of the form a(x* +y7)’. 

Other families of surfaces have been suggested but are apparently rarely 
used. The main suggestion is a limited Fourier expression 


f(xy)= & [ a, ;sin(iw,x) sin( jo, y) +b, ;sin(iw,x) cos( jw, y) 
O<i,j<r 


+¢,,cos(iw,x)sin( jw, y) +d, ,cos(iw,x)cos( jw, y) | (4.8) 


Such a surface is not invariant and has a large number of coefficients 
[4r?+4r+1, since sin(0)=0], so even with r=3 there are 49 coefficients 
plus w, and w, to be fitted or chosen. With small values of r the surface 
shows clear ripples at the fundamental frequencies w, and w, in the x and y 
directions and so is really only suitable for periodic phenomena. With 
data on a rectangular grid this model is much more useful. In particular, 
we shall see in Chapter 5 that there are dramatically better computational 
procedures for that case. 

By itself, trend surface analysis is best at showing broad features of the 
data. Its main use, however, is either to remove those broad features to 
allow some other technique to work on the residuals or as part of the 
computation for the stochastic process prediction approach of Section 4.4. 
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4.2 MOVING AVERAGES 


A very simple way to smooth or interpolate is to predict the value of a 
surface by a weighted average of the values at the data points. These 
weights are chosen as a function of the distances: 


Z(x) = DAZ(x;) DA, =1, A, <w(d(x,x,)) (4.9) 


Clearly, the surface will interpolate if w(d)—>00 as d+0. Common choices 
of w seem to be d~’, e~%4, and e-*”” 

If we suppose w is smooth, clearly Z(x) is differentiable, except possibly 
ata data point. For points at distances from x, small compared with the 
interpoint distances, 


N N 
Z(x)—Z(x,)= Ba(zts,)-z0)) | / “+ Zo 


where w,; =w(d(x,x;)) and we can regard all terms except w, as constant. 
Thus Z is differentiable at x, if and only if d/w(d)-»0 as d0. The same 
argument applies to all other data points. There is also a constraint on 
w(d) for large d. Suppose the region D is expanded. We would expect 
(for a homogeneous pattern of data points) about const- dAd data points 
from distance d to d+ Ad from a point x ED, and these contribute about 
const: dAd-w(d) to the total of the weights. Thus unless {;°tw(t) dt is 
finite, the value Z(x) is not an average of local values, but depends entirely 
on how large D is chosen to be. We need w(d) to decay faster than d =e 
for large d. The effect of violating either condition is to give a fairly 
smooth surface with sharp spikes at the data points to ensure interpolation. 
Figure 4.14 gives an example. Usually w(d) is chosen to be zero beyond 
some arbitrarily chosen radius. 
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°° ° 
x 
200 Fig. 4.2 Susceptibility of moving averages to clustered data points. The 
oo 10 clustered points will dominate the rest for estimation at x. 


There are further drawbacks. If the data points lie on an inclined 
plane, Z will be nothing like a plane. The method is susceptible to 
clusters in the data points, which in mining usually occur at “high” points 
on the surface (see Figure 4.2). One way out is to estimate the value at x 
from the few nearest points in each quadrant centered on x. The problem 
of planar data is worse for a quadratic surface, for Z(x) is bounded above 
and below by the maximum and minimum of the data values and so 
cannot follow a quadratic peak. 

These problems are addressed by the distance-weighted least-squares 
method of Pelto et al. (1968) and McLain (1974). The idea is to solve the 
weighted least-squares problem 


min 5) wo(d(x,x,))[ Z(x,) —£0e,)"B) (4.10) 
1 


for each point x, and let Z(x) be the fitted value f(x)"B(x). Weighted least 
squares can be performed by a simple modification of the procedure in 
Section 4.1—form a diagonal matrix W with entries \/w(d(x,x,)) and 
apply these procedures to WZ, and WF. Distance-weighted least squares 
is also vulnerable to clusters. We still need conditions on w(d). 
w(d)—»0o for d—>0 ensures interpolation, whereas d*w(d)—0 as dc 
gives a “local” character as before. In general, the surface is as many 
times differentiable as 1/w(|u]) is as a function of wER. For w(d)~d ~? 
for small d the surface will be differentiable up to the integer less than p 
unless p is even, in which case the surface is infinitely differentiable (the 
case treated by Pelto et al., 1968). The following argument is a proof for 
a linear surface in one dimension. It extends to the general case, but 
becomes notationally complex! 

Parametrize the line as f(X)=a+b(X—x). Consider x near x,, say. 
Then (4.10) becomes 


min} >) ,(Z(x;)—a—)] x; —x])’+w,(Z(x,)-a)! 


a,b i>l 
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Differentiating with respect to a and 6 and eliminating b we find 


w,Z(x,)+ } o(1+C[x—x,])Z(x;) 


z i> 
a= 
w,+ y o+c> w,(x—x,;) 
i>l i>l 
where 
y w(x; -x) 
_ ipl 
co. 2 
> w,(x;—-x) 
i>l 


Thus since Z(x)=4, 


B(x) — 2(x)-Blx—x,)~ > > o(1+C[ x—x,]){Z(x;)-Z(«,)} 
PLi>l 


(4.11) 


The formula for 6 shows that it is effectively constant as xX—»x,, so the 
differentiability results follow from (4.11). Indeed, the clue to the general 
problem is to note that as x->x,, all the coefficients approach the solution 
of the problem (4.10) with the surface constrained to pass through Z(x,). 

McLain suggested w(d)=(e ~“’/*)/(d? +e), where e is a small constant 
to avoid numerical overflow, and a is the average squared nearest-neighbor 
distance. 
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Many one-dimensional interpolation techniques are based on fitting some 
curve in each of the intervals between the data points and choosing the 
parameters of the curve to give continuity of a certain number of deriva- 
tives at each data point. These are usually called spline methods. This 
section deals with the attempts at defining spatial splines, which are a 
much harder subject. Whereas the data points in one dimension divide 
the real line into intervals, it is not obvious how to use our data points in 
the plane to divide D into regions. Figure 4.3 illustrates the two basic 
constructs. To each point we associate a Dirichlet cell (also known as a 
Voronoi or Thiessen polygon), which is that part of D that is nearer to that 
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data point than any other. Green and Sibson (1978) provide an algorithm 
to construct these cells, which they refer to as tiles. (Their program was 
used in constructing all the Dirichlet tessellations in this book.) From the 
Dirichlet cells we can form the Delaunay triangulation, joining points for 
which the associated polygons have an edge in common. These constructs 
are well known in packing theory; see C. A. Rogers (1964) and Miles 
(1974a) for some properties. Other triangulations have been proposed. 
Sibson (1978) has shown that the Delaunay triangulation uniquely achieves 
the Lawson criterion (cf. Lawson, 1977). Each pair of adjacent triangles 
forms a quadrilateral. Lawson required that the smallest of the six angles 
in the two triangles should be larger for this division of a convex quadri- 
lateral] than that given by the other diagonal, as explained in Figure 4.4. 
An algorithm for this triangulation is contained in Akima (1978). Another 
criterion, attributed by McLain (1976) to Pitteway, is that any point within 
any triangle should have as its closest data point one of the vertices of that 
triangle. Although McLain gives an algorithm, there are patterns of data 
points for which no triangulation is Pitteway. Note that there are degen- 
erate cases in which the Delaunay—Lawson triangulation is not unique, 
and the formal definition given above does not give a triangulation. 
Figure 4.5 illustrates such a pattern. 

The oldest and simplest methods based on tessellations are the method 
of polygons of influence, in which the surface is assigned over each tile the 
value at its defining data point, and fitting a plane to each triangle as 
described by Bengtsson and Nordbeck (1964). Both methods produced 
unacceptable surfaces but could be computed by hand. Note that the 
Delaunay triangulation is defined only within the convex hull of the data 
points and so usually does not cover D. 

The method proposed by McLain uses the distance-weighted least- 
squares technique. He calculated a surface f\(x), f(x), A(x) at each 


Fig. 4.4 The Lawson criterion. The left-hand triangulation is chosen because angle a is 
larger than angle £. 
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(b) (c) 


Fig. 4.5 Degeneracy in the Delaunay triangulation. The formal definition gives (a). Both (b) 
and (c) are valid triangulations. 


(a) 


vertex of a triangle by distance-weighted least squares, and then used 


wm Mi) Si) + wD (X) + 5 f(0) 
w(x) + w,(x) + w,(x) 


F(x) (4.12) 


within that triangle. (He used quadratic surfaces f(x), but that is not 
essential.) For the notation needed, see Figure 4.6. For continuity of the 
surface, w,(x) vanishes on the edge opposite vertex i. McLain claims in 
his correction note that the fitted surface has continuous derivatives of 
order up to and including (7— 1) if 


w(x) = d/'(d7 d3, +433) (4.13) 


and w, and w, are found by permuting the vertex labels. Whether this 
surface interpolates depends on the weighting used to find the distance- 
weighted least-squares surface at the vertex. As before, we need to give 
infinite weight to the vertex to ensure interpolation. 

Sibson (1980b,c) has described a method he calls “natural neighbour- 
hood interpolation,” which is closely related to McLain’s. For data points 
x, and x, define A,(x,) to be the proportion of the total area of the tile of 


BE 
3 2 
d3y doy 


Fig. 4.6 Notation for McLain’s weighting function. 
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x, of the subset of that tile for which x, is the second-nearest data point. 
(The nearest data point is x, by definition.) Then A,(x;)=0 unless the 
tiles of x; and x, have a common side. The functions A,, can be extended 
to arbitrary points x by applying the definitions to the tessellation obtained 
by adding x to the data points. A simple interpolator is obtained by using 
the A,’s as the weights in a moving average, obtaining 


So(x)= DA, (X)Z(x,,) 


Sibson’s preferred solution is to fit planes f,,..., fy by weighted least 
squares, with weight {A,(x,,)/d(x,,x,)°} for the value at x, when fitting f,. 
(These linear functions through each data point replace McLain’s quadratic 
functions.) Combining these functions with weights proportional to 
A,,(x)/d(x,x,,) gives f*(x). The factor A,,(x) gives this interpolator a local 
character, corresponding to McLain’s sum over the vertices of the 
Delaunay triangle in which x lies. The final interpolator is a constant 
linear combination of f, and f* chosen to ensure that restricted quadratic 
surfaces of the form 


atbxt+cyt+d(x? +y?) 


are interpolated exactly. 

Akima (1978) fitted quintic (Sth order) surfaces within each triangle. 
Such a surface has 21 coefficients. Eighteen conditions are imposed to fit 
the surface and first and second derivatives (Z, 0Z/0x, 0Z/ 
dy,07Z/dx*, 8°Z/dy?,d*Z/axdy) at the three vertices. The remaining 
three conditions are that the derivative of the surface perpendicular to 
each side should be a cubic function of the distance from the side. In 
general, this would be a quartic, so this gives one condition per side. A 
cubic is determined by the first and second derivatives at the two ends of 
the side, so this condition ensures that the derivative perpendicular to a 
side is continuous across the side. Similarly, the value of the surface 
along a side is a quintic function of the distance along the side and so is 
determined by the values and first two derivatives along the side at the two 
vertices. The net effect is to achieve a continously differentiable surface 
within the triangulation. The surface can be extrapolated beyond the 
convex hull of the data points as shown in Figure 4.7. Within DABE we 
take a surface that is quintic along the direction of the side AB and 
quadratic perpendicular to AB, defined by the 12 conditions at A and B, 
whereas in EBF we take a quadratic surface defined by the values along 
BE and BF, or, equivalently, by the six conditions at B. 
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Fig. 4.7. Extending Akima’s surface. 


There are several ways to find the required first and second derivatives. 
It is possible that the first derivatives could be given; from drilling samples 
it may be possible to estimate the slope of a rock layer. Otherwise, one 
could fit a quadratic surface by distance-weighted least squares and use the 
derivatives of the fitted surface. Akima’s preferred method is to take at 
each vertex nearby points in pairs and to form the normal to the plane 
through the vertex and the two points. These normals are averaged and 
the resultant vector is used to define the derivative plane at the vertex. 
Second derivatives are formed by applying the same technique to the 
estimated first derivatives. 

Akima’s method has no flexibility at all. It seems computationally 
reasonably fast and to yield pleasing surfaces. An algorithm is available 
on magnetic tape in the CACM series. 

Powell and Sabin (1977) start from the premise that piecewise quadratic 
surfaces are good for contouring. Marlow and Powell (1976) had given a 
program that plots contours within a triangle by quadratic interpolation 
given values at the vertices and the midpoints of the sides. (This point is 
discussed in Section 4.5.) A triangle is divided into 12 subtriangles as in 
Figure 4.8. Then there is a unique surface, quadratic within each subtri- 
angle with a continuous first derivative, with 12 parameters fixed by the 
surface values and slopes at the vertices (nine conditions) and the deriva- 
tive perpendicular to the side at each of P, Q, and R. Such piecewise 
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Q 


Fig. 4.8 Subtriangulation of Powell and Sabin. 


quadratic surfaces are then continuous across a triangulation. (The reader 
is referred to the original paper for proofs.) The derivatives at the data 
points, if not given, can be estimated in any of the ways suggested for 
Akima’s method. Powell and Sabin suggest taking the normal derivative 
at the midpoint as the average of the two normal derivatives at the two 
ends of the sides. This method again has no flexibility, and it is not clear 
how to extrapolate beyond the triangulation. 

Akima (1978) refers to another piecewise quadratic method due to 
Whitten and Koellering (1975) that does not give a continuously differen- 
tiable surface. 

It is not necessary to use a triangular grid, and so authors, including 
Hessing et al. (1972), have suggested forming an irregular rectangular grid 
from the data points. Such a grid can then be reformed into a regular 
rectangular grid by an affine transformation within each cell and the 
methods of Section 4.5 for interpolation from regular data applied. There 
is no known way to uniquely form this irregular grid. The techniques 
used by humans are too vague to program! 


4.4 STOCHASTIC PROCESS PREDICTION 


The analogue for spatial processes of Wiener—Kolmogorov prediction 
theory has been developed and used mainly by Matheron and his school in 
the mining industry. He christened the method “kriging” after D. G. 
Krige. This exposition takes a different point of view from the usual 
treatments; for example, those of Matheron (1965), Delfiner (1976), David 
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(1977), and Journel and Huijbregts (1978). The approach and most of the 
evaluations are new, although foreshadowed by Whittle (1963b, 
Chapter 4). 

The main distinction between this family of methods and those dis- 
cussed so far is that the correlation between values of the surface at short 
distances is explicitly taken into consideration. This remark also applies 
to values at data points, so the “weight” of each point in a cluster is 
automatically reduced. Until further notice we assume that the covari- 
ance function C(x, y) is Known; no stationarity condition is needed. Con- 
ceptually we divide the surface Z(x) into a trend and a fluctuation 


Z(x) =m(x) + e(x) (4.14) 


The distinction between the trend m and fluctuation e is not clear cut. 
Generally we think of the trend as a large-scale variation, regarded as 
fixed, and the fluctuation as a small-scale random process. Then C is the 
covariance function of e as well as of Z. 

A useful analogue is with time-series analysis, in which various methods 
are used to smooth the series, to remove the systematic component (often 
by methods analogous to those in the previous three sections). This is 
equivalent to isolating our trend, which for some purposes may be the aim 
of the analysis. In time-series research, a large amount of effort has been 
spent on describing the distribution of e(x) via correlations and spectra, on 
building parametric models for e(x) and on forecasting. 

The importance of (4.14) is that we will predict the two components 
separately. For a smooth summary we might use 


Z(x) =m(x) 
whereas for interpolation we use 


Z(x) = h(x) + é(x) 


The theory for the prediction of e is precisely Wiener—- Kolmogorov theory 
for a time series with a finite history. 


Known Trend 


Suppose first that m(x) is known, so that we can work with W(x) = Z(x)— 
m(x). (W is the « process at present.) We will consider only linear 
predictors 


W(x) = DA ,W(x;) =)? Wy 
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and choose that which minimizes 
E(W(x)- W(x)) =var(W(x)) —22A,C(x;,x) + ZA,A ;C(x,,x;) 
=0?(x) —2\7k(x) +7KA (4.15) 


where K;, =C(x,,x;), k(x) =(C(x,x,)), a column vector, and, of course, A 
depends on x. Now (4.15) is a quadratic form in A that can be minimized 
by finding its stationary point. Differentiating with respect to each A,, in 
turn, we find 


Kh=k(x) (4.16) 
o2(x)=min E(W(x)— W(x)) =07(x)—A7k(x) 


To solve (4.16) we need K to be nonsingular. Now K is a covariance 
matrix, and hence is nonnegative definite. It is invertible if and only if it 
is strictly positive definite, which we assume, and is no restriction in 
practice. Then our predictor becomes 


W(x) = [WK ~'] k(x) (4.17) 


o2(x)=02(x) —k(x)"K ~'k(x) (4.18) 


We interpret (4.17) as follows. The optimal (that is, minimum mean 
square error linear) predictor is that linear combination of the functions 
C(x,x,) that passes through the data points. Clearly, the optimal predic- 
tor at a data point x; must be Z(x;); (4.16) is solved by A; = 1, A, =0 for all 
J#i. Thus W(x) interpolates W. 

Computationally, (4.17) is convenient, for WK ~' is a row vector that 
can be formed once, after which W(x) can be evaluated for each x with N 
additions and N multiplications. To form K~' we use the Cholesky 
decomposition, which constructs a unique lower triangular matrix L with 
K=LL", Then if y=K~'W, we can find y by L(L’7y)=W,, easily in 
order N* operations since both L and L’ are triangular. To evaluate 
(4.18) we use 


Le=k(x) 


N 
sp o%)-(3ei] (4.19) 
l 
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Note that we need order N? operations to form e and hence oj for each x. 
The most expensive operation is the Cholesky decomposition, which is 
needed only once and takes order N? operations. 


Finding a Parametric Trend 


Suppose we wish to find m, and we assume that it is a trend surface of the 
form f(x)’B discussed in Section 4.1. We now know the structure of the 
“errors” e€(X,),-..,&(X,) SO we Can use a more appropriate technique than 
least squares. Standard statistical theory shows that the covariance ma- 
trix for B is minimized for the generalized least-squares estimate, which 
solves 


min (Zy — FB)'K ~'(Zy — FB) 
= 2 {Z(x,)—£(x,)"B}(K ~')y{Z(x,)-f(x,)"B} (4.20) 
us 
using the notation defined at (4.3). Now (4.20) is 
min (Zy —FB)'L~7L~'(Zy — FB)=min|| L~'Zy — L~'FB || 
which reduces to a least-squares solution of the transformed problem 
L7'Zy =L7'FB+n (4.21) 
Again, the lower triangular nature of L makes this an easy computation. 
The method automatically compensates for clusters of data points, for the 


values in a cluster will be highly correlated and so downweighted in the 
least-squares problem (4.21). The covariance matrix for B is 


(FTK “'F)~'=(FT™L~TL7'F)-'!=R7!R-T 


for the reduction R of Das 3 found by the least-squares algorithm. Thus 
the variance of /1(x) =f(x)’B is given by ||g||, where 


R7g =f (x) (4.22) 


Yet again we benefit from R’ being a lower triangular matrix. 
Prediction with an Unknown Trend 


From our times-series analogy the obvious way to combine the last two 
subsections is to fit a trend surface m(x), form W=Z—m, and then predict 
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W as if the trend had been known. That is 
Z(x) =m(x) + W(x) (4.23) 
o2(x) =var( W(x)) + var((x)) (4.24) 
In fact, (4.23) gives the optimal predictor, but (4.24) is wrong. This is 
easily seen, for (4.24) predicts of(x,;)>0 at a data point, whereas Z(x,)= 
aE need to modify our definition of an optimal predictor slightly. We 
look only at unbiased predictors for which 
E(Z(x)) =m(x) 
From this 
E(Z(x)) = E(2»,Z(x,)) = Zvjm(x;) = Zw, (x,)B 
should equal f7(x)B for all B. We deduce 
v™F=f(x)" (4.25) 
Then 
E(Z(x) — Z(x))’ =var( Z(x)) —2v7k(x) + 97K (4.26) 


as before. To minimize (4.26) under condition (4.25) we introduce 
Lagrange multipliers p and minimize 


var( Z(x)) —27k(x)+»7Kv+2(f(x)’ —v™F )p 


This is again a quadratic form in v, so we differentiate to find a stationary 
point and find 


Kv=k(x)+ Fp 
Fy=f(x) 
o7(x)=var( Z(x)) —»7k (x) + pf (x) (4.27) 


We now show that v7 Z, is Z(x), as defined at (4.23). The latter is clearly 
unbiased, for E(™(x))=m(x), E(W(x))=0, so we need only check the first 
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condition of (4.27). Let A=(F7K —'F)~! 


m(x)=Z1 K ~'FAf(x) 
W(x) = WK ~'k(x) =Z2[ I- K ~'FAF™| K ~'k(x) 


2(x) = h(x) + W(x) =Z5| K ~'k(x) + K ~'FA{ f(x) — F7K ~'k(x)} | 
We identify v as [--- ] and can choose 
u=A{f(x)— F7K ~'k(x)} 
to satisfy (4.27). The error variance becomes 


o2(x)= | var(Z(x)) — k(x)"K ~'k(x) | 


+[ f(x) —F7K ~'k(x)]"A[f(x)—F7K ~'k(x)] (4.28) 


The first term is var( W(x), but the second term is not var(i(x)) = f(x)7Af(x). 
The mean vector f(x)’B is adjusted to [f(x)” —A’F]B, where A is the vector 
of weights used to form W(x). 

If f(x) is merely a constant (4.23) and (4.28) are the procedure Matheron 
calls “kriging” [although he considers the direct solution of (4.27)]. In 
this case, the unbiasedness condition is that the weights v; should sum to 
one (there is no reason why they should be nonnegative), which connects 
the method with the moving averages of Section 4.2. The general case is 
called “universal kriging.” The following summarizes our computational 
procedure: 


1, Form K=[C(x,,x,)], L such that LL7=K. 


F(x). fp(x,) Z(x,) 
: y=]: | 
Si(xn)--- fo(Xw) Z(xy) 


3. Solve L~' Zy =L~'EB by least squares, reducing L~'F to R. 


2. Form F= 


4. Form Wy =[Z(x,;)—f(x;)"B], y such that L(L’y)=Wy. 


5. Predict Z(x) by y7k(x)+f(x)’B, k(x) =[C(x,x;,)] 
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with error variance given by 


of =07(x)—|lel|? +|Igl|? of =07(x)~Wk(x) + |lgl|? 
Le=k(x) or L(L™h)=k(x) 
Rg=f(x)—(L7'F)'e R g=f(x)—FTh 


The first alternative is slightly faster, the second smaller if the least squares 
algorithm destroys L~'F. In that case, h is the vector of weights. 


Smoothness of the Surface 


Suppose now that the covariance is homogeneous, so C(x,y)=C(x—y). 
The smoothness of the predicted surface is governed by the behavior of C 
for small h, as shown by (4.17). We have seen that Z(x) is always an 
interpolator, but this is achieved by a discontinuity at the data points if 
C(h) is not continuous at zero. In that case we have what Matheron 
called a “nugget” effect, which could be attributed either to sampling 
errors in the data values or to very small-scale irregularities in the surface. 
The distinction between the two cases is as follows. If we repeat the 
measurement at exactly the same place we find the same value only in the 
second case, but if we sample close to the original point we obtain a 
substantially different value in both cases. With measurement errors we 
would want to ignore the discontinuous points on the predicted surface, 
which we can do by defining 


k(x,)= lim C(h) 


for each 7. (This limit exists at least for isotropic covariances.) The 
realizations of a process with a nugget effect are very rough indeed, so we 
should not be surprised by discontinuities in the prediction. Let us 
assume that C(h) is continuous at zero. The smoothness of the surface is 
then governed by the smoothness of C(h) at zero. If 


|C(0)— C(h)|<clhi= a> (4.29) 


then Z(x) is continuous. For a Gaussian process 


a.s. 
jh[~*% sup |Z(x)- Z(y)| —0 y<a/2 (4.30) 
|Ix—y|<A 
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so the original surface is continuous if a>0, and n times differentiable if 
a>2n. [These results are given in one dimension by Neveu (1965), 
Proposition III.5.3 and Problem III.5.2. See also Cramér and Leadbetter 
(1968) Sections 4.2, 4.3, and 9.2. The multidimensional versions follow in 
exactly the same way.] From (4.20) Z(x) is continuous if M(x) and C(h) 
are, and Z is n times differentiable if (4.29) holds with a>n (and f is 
sufficiently smooth). Thus the predictor Z will be rather smoother than 
the original surface even at the data points. 

This interpolation approach is illustrated in Figures 4.9-4.11, for the 
data used in Figure 4.1. Figures 4.9 and 4.10 clearly illustrate the effect 
of the smoothness of C(h) on the surface. Both have 


C(x, y) = 1—exp— {d(x,y)/ro} (4.31) 


with 7 =5 for 4.9, 20 for 4.10. The plotting and contour programs are not 
able to follow the sharp peaks! With the larger value of rp the predicted 
surface is able to follow the linear trend except at the edges. Figure 4.11 
illustrates the prediction when a linear trend is specified and ry =5. 


Extensions 


We can consider the prediction of the average of the surface over D or 
over blocks (for instance, the average ore grade in a section of a mine). 


Sd 
oe 
eee 


Fig. 4.9 Prediction of the surface from the same 10 data points as Figure 4.1, within a 
100 x 100 unit square. Covariance given by (4.31) with 7 =5. 


(b) 
Fig. 4.10 As 4.9 with ry =20. 
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but with a linear trend specified. 


> 


Fig. 4.11 As 4.9 
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The only change will be to replace k(x) by the vector of covariances 
between the variable to be predicted and the sample values, and f(x) by its 
average over the block. It is clear from (4.23) that the prediction will be 
precisely the average of the predicted surface over the block. This proce- 
dure is known as block kriging and is mathematically equivalent to follow- 
ing interpolation by a moving average. It can be used as a way of 
smoothing. 

There is no reason why the data should be values at a point on the 
surface as we have assumed so far. They could for instance be averages 
over small blocks within D. Suppose that the ith observation is over the 
block B,. Then we would need to replace K and k(x) by the correct 
covariances: 


k(x)= 


f C(x,y)dy/area (B,) 
B 


‘ 


kK,=f J C(x.y) dx dy/ {area (B,) xarea (B,)} 


BiB; 
and f(x,) by fs f(y) dy/area (B;). 
Classes of Covariances 


So far we have made the unrealistic assumption that the covariance 
function C(x,y) is known. It is this assumption that has saved us from 
assuming homogeneity. However, to estimate C we must make this 
assumption, and usually we will need to assume isotropy as well to obtain 
a sufficiently compact description of the covariance. If there are a priori 
grounds on which to assume that one direction differs in character from 
the others (such as depth in a three-dimensional description of a mine) we 
might assume geometric anisotropy and take C(h) as a function of \/(h? + 
h3, +ah3), say, where a is an additional parameter. 

Convenient classes of isotropic homogeneous covariances are rather few. 
Fortunately, what matters most in interpolation is the behavior of C(r) 
near the origin, so we really only need to model the small-scale part of the 
covariance structure. However, we may well have no pairs of data points 
whose distance apart is in the critical range. This is a particular problem 
for a very regular pattern of sample points. In a mining or drilling 
situation we often will have dense sample points in a part of the study 
region. Even though this represents a “good” part of the region, we must 
assume that the local behavior there is typical. The prediction procedure 
essentially copies this local behavior elsewhere. 
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We have already used one parametric family of covariances, the ex- 
ponentials at (4.31), and have shown in Section 2.2 that this is a valid 
isotropic covariance for all r in all dimensions. It produces a continuous 
but non-differentiable predicted surface. 

Zubrzycki (1957) introduced a process found by counting the number of 
points of a Poisson process of intensity A within distance (R/2) of the 
sample point. Then 


C(h) =A meas(b(0, R)Mb(h, R)) 


where b(h, R) denotes the ball of radius R centered ath. This gives 


C(r)= o1-2/ Ev(1-5) +505) SR (4.32) 
0 r2>R 
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Fig. 4.12 Zubrzycki’s correlation function in two dimensions (a) and three dimensions (b). 
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in two dimensions and 


eee en aenre 28 r<R (4.33) 
0 r>R 


in three dimensions. Figure 4.12 illustrates these functions. Their behav- 
ior near the origin is very similar to an exponential function. Note that 
although the description is of a discontinuous surface, a Gaussian surface 
with this covariance is continuous by (4.30). The advantage of these 
models is that data points more than R from the point x being considered 
can be ignored, which can be used to reduce the computations. 

Whittle (1954, 1963a) considered models with 


o 


C(r)= PT PF | r/t>)’K,(r/1) v>0 (4.34) 


These functions are illustrated in Figure 4.13. The choice p=} is the 
exponential class; for O0<v< i, C(r) has an infinite derivative at 0, for 
v>} a zero derivative there. The class 


C(r)=o7 exp{ —(r/1m)"} (4.35) 


also has a very smooth behavior for small r. (It is unfortunately some- 
times called the “Gaussian” family!) We saw in Section 2.2 that these 
were valid covariances. 
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Fig. 4.13 Whittle’s family of correlation functions. The values of » are 2, 1, 0.5, 0.2 from top 
to bottom, all with ry = 1. 
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Fitting Covariances 


In contrast to time-series analysis, very little work has been done on fitting 
the parameters of any of these models. We have seen that small distances 
are probably most important for interpolation. The usual approach is to 
look at the empirical covariances. To do so we form c,,;=({Z(x,)— 
Z}{Z(x;)—Z} for all pairs of data points, and average all values for pairs 
(x;,x,) with x,;—x, in each of a set of intervals. For the isotropic 
case—the only one we shall consider in detail—we average c,, for pairs 
with d(x,,x,;) in each of the set of intervals. Figure 4.16a illustrates a plot 
of correlations in which the distances were divided into 100 equal-size 
intervals. Thus the point at r represents the average correlation between 
data points whose distance apart is in the interval (r—A/2,r+A/2) for 
A~0.09 km. Of course some intervals of distance include no pairs of data 
points, and then no correlation is plotted. A problem with such plots is 
that the variability decreases as the distance increases, for the average 
number of pairs in each interval is proportional to the distance. The 
right-hand points are averages of many more points than those at small 
distances. Unfortunately, the latter interest us most. The correlation 
functions used in the examples have been fitted to such plots by eye. It is 
not always easy to decide when to include a “nugget effect” and allow a 
discontinuity in the correlation function at the origin, or whether to choose 
a linear (4.31 and 4.32) or quadratic (4.35, 4.34 for »> ) behavior at the 
origin. 

In the absence of formal fitting procedures, the best way to assess the fit 
of a covariance function seems to be by cross-validation. Each data point 
is deleted in turn and its value predicted from the rest of the data, using 
the fitted covariance function. The prediction errors are then assessed. 
It is tempting to form the sum of squares of these errors. This may, 
however, not be suitable, for it may be dominated by a few data points 
that, because of their isolation, are hard to predict. Scaling the prediction 
errors by their standard errors and then forming a sum of squares biases 
the comparison in favor of covariances which give high standard errors by 
means of small covariance at typical interpoint distances. Even if a 
suitable summary could be found, its numerical minimization by altering 
the parameters of the models is probably too expensive for more than a 
few data points. 

A complication arises when the covariance has to be assessed from data 
to which a trend surface is to be fitted, for the correlation plot should be of 
the residuals from the surface, which is to be fitted by generalized least 
squares using the correlation function. This apparent circularity can be 
broken by an iterative procedure. First fit a surface by least squares (or 
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another guess at the covariance function), guess a covariance function, 
refit the surface using this, and refit a covariance function to the new 
residuals. If necessary, the process can be iterated. Forming correla- 
tions of residuals introduces a bias, for the correlation function of the 
residuals differs from that of the original e process. The difference should 
be small at short distances if (4.14) is really a separation into a large-scale 
trend and small-scale fluctuation. 


An Example 


The data are 51 measurements of height of the earth’s surface within a 
310-foot square, from Davis (1973, Table 6.4). Figure 4.14 illustrates 
interpolation by weighted averages, with weighting functions d~', d~?, 
and d@~*. The unsatisfactory nature of the first two choices is clearly 
seen. (Most of the small spikes up or down to the data points have been 
missed by the contouring program.) Figure 4.15 shows trend surface fits 


(a) 


Fig. 4.14 Weighted averages for topographic data within a 310-foot square. Weighting 
functions were (a) d~', (b) d~?, (c) d~*. 


(c) 
Fig. 4.14 (continued) 
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Fig. 4.18 Trend surfaces for topographic data (from Davis, 1973, Table 6.4) of orders 1({a) 


to 5(e). 
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Fig. 4.15 (continued) 


of orders 1 (linear) to 5 (quintic). The cluster of points at the middle back 
causes that part to be fitted rather too well. Figure 4.16 shows correlation 
plots for the data and residuals from second and fourth order surfaces, 
together with some covariance functions fitted by eye. The corresponding 
generalized least-squares surfaces are not shown, but differ only in that the 
cluster of data points is given less weight. Figures 4.17—4.21 show the 
predicted surfaces and the prediction standard errors for the five models 
fitted in Figure 4.16. Introducing a trend surface makes little difference 
to the prediction except at the corners, but does reduce the standard errors 
somewhat. The big difference is made, however, by the choice between 
(4.31) and (4.35), which have very different characters at the origin. The 
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Fig. 4.16 Correlation plots. (a) Data. (b) Residuals from quadratic surface. (c) Residuals 
from quintic surface. The covariances shown are (a) (4.31) and (4.35) with 7 =2, (6) (4.31) 
and (4.35) with 7p = 1, and (c) (4.31) with 7) =0.5. The abscissa is distance in units of 50 feet. 
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(c) 
Fig. 4.16 (continued) 


Fig. 4.17 Predicted surface (a, 5) and standard error of the prediction error (c) for (4.31) 


with ry) =2. 
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(b) 
Fig. 4.17 (continued). Predicted surface. 


quadratic behavior of (4.35) gives a much more “rounded” surface and 
much lower standard errors. The data give little help in making this 
choice, which needs external evidence. 


Conditional Simulation 


The prediction standard errors are little help in assessing the variability of 
nonlinear functions of the predicted surface, such as the area above a 
critical value, so it is helpful to have a means to simulate the fluctuations 
of the unknown parts of the surface. Suppose S is a simulation of a 
process with the covariance of the surface Z. Form a prediction S$ from 
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(c) 


Fig. 4.17 (continued). Standard deviation of prediction error. 


the values of S at the data points and 
S(x) = S(x) + (S(x)-S(x)} 
Z(x)=Z(x) + {Z(x)—Z(x)} 
c(x) = Z(x) + { S(x) — S(x)} 


Then c is a realization of a surface with the same covariance as S and Z; 2 
and S are both interpolators, hence c passes through the data points. To 
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Fig. 4.18 As Figure 4.17 for (4.35) with ry = 2. 


Fig. 4.18 (continued ) 


Fig. 4.19 As Figure 4.17 for 79 =1 with a quadratic trend. 
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(b) 
Fig. 4.19 (continued) 


find the covariance function of c: 
cov( c(x), c(y)) =cov( Z(x) +.5(x) — S(x), Z(y) + Sty) — S(y)) 
=cov( Z(x), Z(y)) +cov([ s— $](x), | s—$](y)) 
=cov( Z(x), Z(y)) + cov([ Z—Z](x),[Z-—Z](y)) 
=cov( Z(x), Z(y)) 


The second equality follows from the independence of S and Z, the third 
from the equality of the distributions of S and Z, and the fourth from 


cov(Z(x),[Z—Z](y))=0 forall x,y 


(c) 
Fig. 4.19 (continued) 


ee 


Fig. 4.20 As Figure 4.18 for 7) =1 with a quadratic trend. 
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0.5 with a quartic trend. 


Fig. 4.21 As Figure 4.17 for r 


71 


72 SMOOTHING AND INTERPOLATION 


(c) 
Fig. 4.21 (continued ) 


This is a characteristic property of an unbiased linear minimum mean- 
square error predictor and may be derived as follows: 


E(Z(x)—2(x)+aZ(y)) = E(Z(x)—Z(x)) 


+a7E(Z(y)) +2a£({Z(x)-Z(x)}Z(y)) (4.36) 


The left-hand side of (4.36) is minimized by a=0, which can only be the 
case if 


E({Z(x)— Z(x)}Z(y)) =cov( Z(x) — Z(x), Z(y)) =0 


because Z(x) is unbiased. 
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Nonlinear Problems 


The minimum mean square error predictor of Z(x) is always E(Z(x)|data). 
So far, we have only looked within the class of linear predictors. In 
general, this conditional expectation needs knowledge of the whole distri- 
bution of the process Z. However, for Gaussian processes conditional 
expectations are linear functions of the data, so minimum mean-square 
error predictors are linear and hence are found by the algorithms of this 
section. If we could find a transformation ¢ such that {(x)=@7 '(Z(x)) is 
a Gaussian process, we would be advised to predict { and use $(§(x)) as 
our predicted Z surface. Probably the only common example is when 
Z(x) is thought to be lognormally distributed. Then defining 


§(x) = log Z(x) 


each {(x) is Normally distributed, and it is a short step to assume joint 
Normality of ¢(x,),...,§(x,) and (x), which is the definition of a 
Gaussian process. However, we need the mean to be relatively constant 
or to act multiplicatively, for if 


§(x) =m(x) +(x) 
then 
Z(x)=e™Me™ = e™™e(x) 


where 7 is a Gaussian process and e(x) is lognormally distributed. 

We could attempt to assess the distribution of Z(x) by looking at a 
histogram of the data values. This may be confused both by the varying 
means (if a trend needs to be removed) and by the correlation between 
these random variables. Further research is needed. 

Mining engineers are interested in estimates of the numbers of blocks in 
a mine with average grade above the economic cutoff point and in the 
total amount of ore in such blocks. If Z, denotes the average of the 
surface within V, to give estimates of “probable” and “proven” reserves in 
an oil or gas field we need estimates of 


P(Z, 2 x|data) (4.37) 


In mining x is the critical grade, whereas in reserve evaluation we choose x 
to give conventional probability levels. We have estimates of the mean 
and variance of Z, given the data, namely Z, and oZ(V). In the case of a 
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Fig. 4.22 Confidence ellipsoid for Zy and Z,. The regression lines for Zy on Z,(---) and 
Z, on Z,(-:-) are shown. 


Gaussian process (with known covariance structure), the conditional distri- 
bution of Z, given the data is Normal with this mean and variance, so 
(4.37) may be estimated. However, whereas E(Z,)= Zy, var(Z,))< 
var(Z,,) [cf. (4.18)]. A typical plot of Ze vs. Zy is given in Figure 4.22. 
An interesting point to note is that the regression of Zy on Zvi is a line of 
unit slope, and for Gaussian processes this is E(Z AZ y) (since E(Z,,|data) 
depends on the data only through Z vy). Thus whereas if the cutoff grade 
is above the mean, Z y Will be lower than Z,, Z y does provide a reasonable, 
unbiased predictor of Z,, and the estimate of (4.37) should not be 
misleading. 

These points are discussed in more detail by Matheron (1976a,b) and 
Maréchal (1976). Their “disjunctive kriging” can be viewed as a series 
expansion method of finding transformations to a Gaussian process and 
seems quite open to distortion by sampling variability. 
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4.5 CONTOURING 


Contouring algorithms take as input either an array of function values on a 
fine grid or a function that will evaluate the surface at any given point. If 
function evaluations are cheap, a technique that follows a contour, evaluat- 
ing at points near to the contour as it goes, will be very accurate. The 
problem is to ensure that all the curves at a given contour level have been 
found. Since this method requires a large number of evaluations at each 
contour level, it is usually prohibitively expensive. 

The alternative is to use a grid, usually rectangular. Contours are 
interpolated linearly along the edges of the grid and followed across the 
grid. To ensure all branches of the contour are found the boundary 
should be searched, then one set (horizontal or vertical) of internal edges. 
Previously traced branches can be found either by marking each edge 
visited or by checking an edge intersection against a list of all previously 
found points on contours at that level. Two problems arise when actually 
following contours. Figure 4.23 illustrates the three cases which can arise. 
At a saddle point [case (c)] a choice must be made. This can be done by 
taking the shorter of the total lengths of each pair of routes or, equiva- 
lently, by fitting the hyperbola a+bx+cy+dxy within that square. The 
second problem arises when a contour actually passes through a grid point. 
This is avoided by perturbing downwards by a negligible amount all values 
very close to the contour level. An algorithm on these lines has been 
given by Snyder (1978). The contours in this book were drawn by a 
similar algorithm with the added refinement of passing a smooth curve 


(b) 


Fig. 4.23. Three cases in contour following. 
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through the edge intersections (cf. McConalogue, 1970, 1971). This refine- 
ment may cause contours to cross. For this reason, or because linear 
interpolation is not good enough, we may need to interpolate to a finer 
grid. The rest of this section discusses interpolation from data on a 
rectangular grid. 

We can use the methods of the last three sections to interpolate within a 
rectangular grid, but simpler methods are available. We could use spline 
interpolation. The usual idea is to interpolate in one grid direction, and 
then the other, say from an mXn to kmxXn to kmxXiIn grid. Then we can 
use one-dimensional spline interpolation. The choice of intermediate 
grid, kmXn or mXin, does not matter (De Boor, 1962; Bhattacharyya, 
1969, 1971; Whitten and Koellering, 1973). 

There are simpler versions of the procedures of Akima and Powell. 
Akima (1974) uses a polynomial function in each cell of the grid, of the 
form 


ris 
> 4,,x'y 
r,s<3 


This is not a cubic polynomial, but is the form produced by a double 
application of cubic splines. However, Akima does not use the spline 
criterion of fit, but estimates Z,dZ/dx, 9Z/dy and 3°Z/dxdy at each 
corner of the cell and uses these 16 values to find the 16 coefficients. The 
resulting surface has a continuous first derivative. The proof is similar to 
that for his general scheme. The derivatives are estimated by local 
differences. Akima gives an example to support his claim that his method 
is preferable to cubic splines when contours run diagonally on the grid. 
Powell uses the triangulation shown in Figure 4.24. Data are then 
available at six points on the boundary of each triangle, and a quadratic 


Fig. 4.24 Triangulation used in Powell’s contouring 
algorithm. 
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surface is fitted within each, giving an overall surface with a continuous 
first derivative and piecewise elliptical contours. These are esthetically 
pleasing, but are produced in small pieces and very slowly (one-tenth the 
speed of the algorithm used for my figures). Usually some form of 
postprocessing will be necessary to follow a single contour from triangle to 
triangle. 
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CHAPTER 5 


Regional and Lattice Data 


Data on a regular lattice usually arise from a planned experiment or from 
a systematic sampling scheme. Agricultural field trials are carried out to 
compare varieties of crops. A field is divided into a number of rectangu- 
lar plots (of the order of 400) separated by narrow strips, and varieties or 
treatments assigned to the plots by some experimental design. Clearly, we 
would expect a variation in fertility across the field that will be reflected as 
correlation in the “errors” of the linear model for yield associated with the 
experimental design. “Uniformity trials” were designed to investigate 
fertility variations. They are field trials with the same variety and treat- 
ment in every plot. One of the most famous is that reported by Mercer 
and Hall (1911). Fairfield Smith (1938) lists many others. Of course, the 
usual remedy is the use of blocks, as this was the experimental situation for 
which the analysis of variance was created. In Section 5.3 we investigate 
an alternative approach that models the correlations. 

Other experiments performed on rectangular and triangular lattices are 
those designed to investigate the efforts of competition on plant growth. 
Typically young trees are planted at the lattice points, experimental de- 
signs being considered by Martin (1973) and Veevers and Boffey (1975). 
Lattice data also occur from systematically sampling a continuous surface 
(as advocated in Chapter 3) and from the quadrat sampling schemes of 
Sections 6.3 and 6.5. 

Lattice data can be considered to be the two-dimensional analogue of a 
time series. This probably explains why it has received so much attention, 
for it is much rarer than data from irregularly spaced points or data on 
point patterns. In Section 5.1 we consider the multidimensional generali- 
zation of spectral analysis and in Section 5.2 parametric models related to 
autoregressive processes. 

Regional data are sometimes regarded as occurring on an irregular 
lattice. Autoregressive models can be defined in a sufficiently general 
framework to be applied to observations on regions the connections 
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between which can be defined, for instance, by transport costs. This is 
done in Section 5.2, and the results applied in Section 5.4. Most of what 
geographers know as spatial statistical methods are multivariate rather 
than spatial in the sense of this volume, but the modeling of regression 
residuals by autoregressive processes provides a link with the rest of this 
chapter. 

The reader is warned that Section 5.2 uses more matrix algebra and 
statistical theory than is needed elsewhere in this book. 


5.1 TWO-DIMENSIONAL SPECTRAL ANALYSIS 


Throughout this section we assume that we are given data (Z,,,) on a NX 
by NY rectangular grid with N=NXXNY observations in total. By 
analogy with time-series analysis, we form a correlogram r and periodogram 
I by 

M, M, 


l es = 
e(r,s)= ad D (Ze -Z )(Zusrcrs-Z) (5.1) 
Why tg 
m, =max(1,1—r) m,=max(1,1—s) 


M,=min(NX,NX—r) | M,=min(NY,NY-s) 
r(u, v) =c(u, v)/c(0,0) (5.2) 


2 
N |} 7\ > thu, —ipo 
LORS W & (Zw Z)e Ate . (5.3) 


We can think of c(r,s) as the average covariance over all pairs of 
observations whose coordinates differ by the vector (r,s). [There are 
(NX —|r|)(NY—|s]) such pairs, suggesting this as a divisor in place of N. 
Alternative forms replacing Z by row and column means have also been 
used.] The functions c and r are symmetric in the sense 


e(—-r, -s)=c(r,s) 
but 
c(—r,s)#e(r,5) 


in general. We will plot only the right half-plane; the left half-plane can 
be found by a half-turn rotation. 
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The periodogram has the same symmetry properties as the spectral 
density; we will plot the range (—7<A<a7,-a7<p<m7). Wecan define / 
in real numbers by 


1A, W=—> > Dd c(u, v){cosAucospo—sin Ausin pv} 
At” \ul<Nnx |v] <NY 


Suppose that Z is a realization of a homogeneous stochastic process with 
mean m, covariance C(h) and spectral density f(w). Then if h=(r,s), 
X=(u,v), o=(A, 1), 


NE(c(h)) = 3 E| {Z(x)—m-(Z—m)} {Z(x+h)—m—(Z—m)} | 


= > (c(h) +var(Z )} -£|(Z—m) > (Z(x+h) +Z(x)-2m) | 


= N{C(h) —var(Z )} (5.4) 


when |r|«NX, |s|<NY. For large N we will usually also neglect var(Z). 
Then (5.4) shows that c(r,s) is an approximately unbiased estimate of 
C((r, s)) for small r and s. However, the sampling variance depends on 
C(h) and neighboring values of the correlogram are substantially corre- 
lated. As in time-series analysis, the periodogram has a more satisfactory 
asymptotic sampling theory. 

We say w=(A,p) is a Fourier frequency if X and p are multiples of 
2a/NX and 22/NY, respectively. Then asymptotically (for large N) 
I(w)/f(w) are independent at different Fourier frequencies with a stan- 
dard exponential distribution, except at (0,0), (0,7), (7,0) and (7,7) 
(1(0,0)=0 and I(w)/f(w) has a x? distribution at the other three frequen- 
cies). This result is a direct extension of the one-dimensional version 
(Bloomfield, 1976, p. 189). Thus for almost all Fourier frequencies w 


E(I(w))~f(@) — var(I(@))=f(@) 


So although J(w) estimates f(w), is is not a good estimator. Again 
borrowing ideas from time series we smooth /(w) by a window W and use 


g(o)= f W(2)Mw—2) do (5.5) 
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In our examples we have taken W as a bivariate Normal density 
W(w) = {exp— || ||?/2a?} /27a? (5.6) 


where a is an adjustable positive parameter. The approximate mean and 
variance of g(w) are 


E(g(w))~f(w) (5.7) 


var(g(«))~4n7f(co)” f Ww)? deo/N=nf(w)/Ne® (5.8) 


It would appear from (5.8) that we should take a as large as possible. 
However, the approximations are only good when /f(w) is reasonably 
constant over the range in which W(w) is large (say over a radius of 2a in 
our example), which will limit the value of a. From (5.7) and (5.8) we see 
that taking logarithms will stabilize the variances, so we usually plot our 
estimate g of a spectral density on log), scale. Then 


log 98(@) = logigf(w) +7 (5.9) 


where 7 is usually taken to be approximately Normal with mean zero and 
variance 


4n? f W(w)* dw(log,e)’/N = 2(log,e)°/Na* 


The computation of a smoothed spectral density estimate is done using 
the Fast Fourier Transform (Bloomfield, 1976, Chapter 4). The mean is 
removed from the original array and this is extended by zeroes to an 
NX2XNY2 array, when each dimension is a power of 2. The FFT is 
applied to each row and then each column, the squared amplitude of each 
term in the resulting complex array taken, and the values scaled by 
1/42?N. This gives an array of values of J at frequencies of the form 
(Qar/NX2,27s/NY2) for 0<r<NX2,0<s<NY2. The estimate (5.5) is 
then approximated by a sum over these frequencies. 


Mercer and Hall Data and Other Examples 
Figure 5.1 illustrates the application of these methods to the wheat yields 


of the Mercer and Hall 1911 uniformity trial, There are 25 plots in the x 
direction (West—East) and 20 in the y direction (North-South), the plots 


0.15 and have 95% 


Fig. 5.1. Mercer and Hall’s yields of wheat on a 20X25 grid of plots. (@) Perspective plot of 
the data. Left—right is West—East, front to back is South to North. (b) Correlogram. The 


spike is of unit height at the origin. Left—right corresponds to lags 0 to 8 in x, front to back to 
lags —8 to 8 in y. (c) Periodogram with range (—7,7] in each direction. (d, e, f, g) 


Smoothed spectral density estimates on log,g scale. (d, e) are for a 


=0.40 with 95% confidence range 0.27. 


confidence range 0.75, (f, g) of a 
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Fig. 5.1 (continued ) 


being about 11 feet square. The particular view of the data chosen 
illustrates its most marked feature, ripples running East-West. These 
show up as barely noticeable ripples on the correlogram and clearly on the 
periodogram as a peak at frequencies (107/32,0) and (—107/32,0), 
corresponding to a wavelength of about 35 feet in the East- West direction. 
Two different smoothings of the periodogram are shown with a=0.15 and 
0.40. The width of the 95% confidence band at each frequency shows that 
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Fig. $.1 (continued) 
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(a) 


Fig. 5.1 (continued) 


the two peaks are the only significant features of the less smooth picture, 
and we should increase the smoothing as shown in Figure 5.1f, g. 

Mercer and Hall noted the ripples in their data but attributed them to 
“casual irregularity.” The spectral analysis shows that they are a signifi- 
cant feature of the data. Perhaps the most likely explanation is a varia- 
tion in soil fertility caused by layers in the outcropping rocks. Patankar 
(1954) found an East-West linear trend representing an 11% change in 
yield across the field. 

Satisfactory published examples of two dimensional spectral analysis 
seem rare. Rayner (1971) and Rayner and Golledge (1972) give some 
examples in geography, and Ford (1976) has a fascinating example from 
forestry research. A 12036 meter plot of 40-year-old Scots pine trees 
was surveyed and the maximum crown height in each meter square 
recorded, as well as whether there was a trunk in that square. The longer 
axis of the plot was West to East, known to be the direction in which 
planting had taken place. To restrict attention purely to the canopy, 
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(a) Autocovariance 
function 


(b) Power 
spectrum 


(, 7) 


Fig. 5.2 Correlogram and estimated spectral density for crown heights of trees. (Reprinted 
with permission from Ford, 1976.) 


heights were measured from 11.8 meters and negative heights truncated to 
zero. Figures 5.2 and 5.3 show the correlogram and a smoothed spectral 
density estimate from these truncated heights and the 1/0 matrix of 
presence/absence of tree trunks. The correlogram for crown height shows 
clear ridges at a spacing of about 18 meters in a direction about 15°W. 
Only one quadrant of the spectral density is shown (the figures in Ford, 
1976 are incorrectly labeled). The full plot shows ridges with direction 
75°E, parts of which are shown at A, B, and C in Figure 5.2b. The peak 
on this ridge at A corresponds to a wavelength of 2 meters in the 
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(a) Autocovariance 
function 


(b) Power spectrum 


(x, 0) 


Fig. 5.3 Correlogram and estimated spectral density for the tree position matrix. (Reprinted 
with permission from Ford, 1976.) 


East-West direction. None of these features is evident in Figure 5.3; the 
peak at a diagonal frequency on Figure 5.36 is small enough to be 
attributed to chance. No apparent pattern in the tree positions is not 
particularly surprising, for although the trees were planted in rows, about 
two-thirds will have been removed naturally or by thinning in a haphazard 
way. The strong directionality found in the crowns could be attributable 
to prevailing wind directions or to a need for the crowns to expand 
diagonally given the original lattice planting. 

Fasham (1978a,b) has used spectral analysis to study patterns of plank- 
ton. 
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5.2 SPATIAL AUTOREGRESSIONS 
A simple autoregressive time series can be defined either by 
X,=aX,_,+e,  €,~N(0,07) independent (5.10) 
or by 
E(X,|past values) =aX,_;, var( X,|past values)=o? = (5.11) 
where X, is assumed in the second case to be a Gaussian process. The 


asymmetry of conditioning on the past is important, for if we consider the 
two-sided versions 


a 
X= G(X t Xia) te, (5.12) 
E(X, |rest) = 5 (Ment +X,,,),  var(X,|rest)=o? (5.13) 


these define different processes and are prototypes for simultaneous (SAR) 
and conditional (CAR) autoregressions, respectively. The distinction be- 
tween them is due to Brook (1964). Note that (5.12) is, in fact, the Yule 
process 


X, =b,X,_, +b,X,_2 +0, 


in a nonstandard representation. 

It will be most convenient to work in complete generality with observa- 
tions (Z,,..., Z,,) at sites, following Besag (1975). We define SAR and 
CAR Gaussian processes by matrices S and C, assumed to have zeroes on 
their diagonals, and 
SAR Z,=ph, + DS, (Za) +6; (5.14) 

e, independent M(0, 07) 


CAR E(Z,|Z,, sAi)=p, + XC,(Z —u,) (5.15) 
Jj 


var(Z,|Z;, j#i) =o? 
n=Z—p—C(Z~p) 


Let V, and V. be the corresponding covariance matrices for the (Z;). 
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Theorem 
E(Z,)=n; 
V,=07(1-S)"'UI-ST)"' (5.16) 
Voso%(I-C)"! (5.17) 
cov(e)=o7J —cov(e, Z)=a7(I—S7) | (5.18) 
cov(n)=07(1-C)  cov(n, Z)=a7I (5.19) 


Necessary and sufficient conditions for existence are 
SAR (J—S) nonsingular. 
CAR (J—C) symmetric and strictly positive definite. 
PROOF. Let Y=Z—yp 
SAR (I1-S)Y¥=e so (I—-S)E(YY')UI-S™)=071 (5.16) 
whence 
cov(e, Z)=E(eY”)=E{ee™(1—S7)"'} =02(I-ST)"' 


If (J—S) is nonsingular, V, is strictly positive definite. If (J—S) is 
singular, (/— S)Y=e will, in general, have no solution. 


CAR (I-C)Wo)y =| BB ~ Ca) %¥s} =E( WY, ~ YE Cu) 


= E{Y,Y, — Y,E(Y,|rest)} 

=E(Y,Y, —E(Y,Y,|rest)} = E(¥,¥, ~¥,¥,)=0 
for j¥#i 

= E{Y, — Y,E(Y, |rest)} =var(¥, |rest) =o? 


for i=j 
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Thus (J—~C)V.=o07I. If (J—C) is singular, this has no solution. If 
(I—C) is nonsingular V. =0?(1—C)~', so (I— C) must be symmetric and 
strictly positive definite. 


cov(n) = E(nn")=(1- C)E(YY")U-C7)=(I-C) VU C) =0°(I-C) 
cov(n, Z)=E(nY7)=(I-C)E(YY")=071 v 


Notice that for a CAR scheme V. determines C, whereas for an SAR 
process many matrices S can give Vz. The second formula in (5.19) is 
important, for it shows that CAR schemes have an “innovations” property 
generalizing the independence of X, and (e,, 5 >?) in (5.10). 

Any SAR process is a CAR process with C=S+S7~—S7S. Since we are 
considering only a finite set of sites, the reverse is true in a rather 
unnatural way. We can take S=J—L’, where LL’ is the Cholesky 
decomposition of (J—C). 

The log likelihood of either scheme is given by 


a ak eae men Te 
5 In2no + 5 In| Bl 553 (2 1)"B(Z—p) (5.20) 


for 
B=(1-S™)\(I-S) or  B=I-—C, _ |B| the determinant of B 


If w is specified by the linear model D@, then the maximum likelihood 
estimators of @,a7 and parameters in B are given by 


6=(D™BD)'D™BZ (5.21) 
62 =n~'(Z— D6)"B(Z— D6) (5.22) 


A 


which are the generalized least-squares solutions for covariance o7B 
and B minimizes 


1 
’ 


—n~ "In| B|+1n6? (5.23) 
For an SAR scheme 6? is the mean square of the residuals 
é=Z—Db6-S|Z-D6| 


so that if the determinant |B|=|/—S|? is not identically one, (5.23) does 
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not give the least-squares solution for parameters in S, and this solution 
may not be consistent (Whittle, 1954). (In Section 5.3 there is an example 
in which the least-squares solution is out by a factor of 2.) The problem 
in the numerical minimization of (5.23) is the evaluation of | B| if this is 
necessary. Ord (1975) pointed out that if C is of the form @H for a single 
parameter 4, then 


|B|= Ih(-96,) 


where &,,...,€y are the eigenvalues of H. This formula also can be 
squared to give | B| for an SAR with S=oH, but the eigenvalues may then 
be complex numbers if H is asymmetric. 

Besag (1975, 1977c) introduced pseudo-likelihood estimation, in which 
the pseudo-likelihood 


PL= p(Z,\Z,, J#i) 
is maximized. For a CAR process 
i 
in( PL) = — = In2a02 — —~||\(I-C)(Z—p) |? (5.24) 
2 20? 


so pseudo-likelihood estimation reduces to finding 6 6? as the mean square 
of the residuals 7=(Z—fi)— C(Z- fi) and choosing € to minimize 62. 
The “innovations” property in (5.19) is the clue as to why a least squares 
fit is sensible for a CAR process, but not for an SAR process. In the 
simple case, C=H, p=0 


oP! =Z™HZ/\| HZ||? 


oP! —@ =n HZ/\| HZ ||? (5.25) 


and the innovations property gives the numerator of (5.25) a zero mean. 
As Besag (1975) points out, this allows ¢”” to be consistent if H varies with 
nin a suitable way. 

The spatial nature of these autoregressions is introduced by taking the 
sites (now labeled by r,s) on a finite part of a rectangular lattice and 
parameterizing S or C using the lattice structure, so S,, or C,, depends 
only on the vector s—r. Some typical specifications are given in Figure 
5.4. The scheme of Figure 5.4c is particularly simple as a CAR, for it is 
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then equivalent to (f) as an SAR and to the product 


Lup =XuXo X, =AX,_ te 


Y,=vY,_1 +e) 


for in each case R(j,k)=A'/lv!*"|, where a=A/(1+X), B=v/(1 +72). 
This process was introduced by Martin (1979). 

Figure 5.4e, f, g illustrate one-sided schemes, which have the advantage 
that |B|=|J—C| or |/—S|? is identically one and the computation of the 
maximum likelihood estimator is eased. The scheme of Figure 5.4g is 
adapted to a TV scan of a lattice, for it exhibits dependence only on past 
values of the scan. The other two are examples of the one-sided SAR 
schemes considered by Tjestheim (1978) on an infinite lattice, for which 
S,, =0 unless r>s coordinatewise. Tjgstheim showed that direct ana- 
logues of the Yule—Walker estimation procedure and the Mann-Wald 
asymptotic theory for autoregressive processes are available for such 
schemes. They are easily re-expressed as symmetrical CAR schemes by 
C=S+S7—S'S. Moreover, Tjgstheim shows that any reasonable homo- 
geneous process on an infinite lattice has an infinite one-sided simulta- 
neous moving average representation. It is not easy to construct finite 
one-sided SAR approximations to other processes. Nevertheless, one- 
sided SAR schemes are important. 

Whittle (1954) in a remarkable early paper gave some results as part of a 
general Fourier inversion method which amount to determining |/—S|?/” 
asymptotically for some simple schemes. For Figure 5.45 


j=ik=o [kM j-k)!] 


and for a=8 (Figure 5.4a) this reduces to 


S 2(74) gas (5.27) 


galt \JI 


Other evaluations of |B| based on finding eigenvalues are given by Besag 
(1977a, and in the discussion of Bartlett, 1978b), and Ord (1975, Appendix 
C). 

“Coding” schemes for CAR processes use the distribution conditional on 
the observations at a carefully chosen set of sites (see Figure 5.5) so that 
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Fig. 5.4 Weights for autoregressive schemes. The value at the site marked by a circle is 
regressed on those sites with weights marked. 


the remaining observations are conditionally independent with equal vari- 
ances and means which are linear functions of the parameters, and are 
Normal. These parameters can then be fitted by least squares and have 
an exact conditional sampling theory. The procedure is wasteful of data 
and seems particularly likely to give estimates of J—C that are not positive 
definite. Besag (1972b, 1974) used coding estimators; Besag and Moran 
(1975) and Besag (1977c) explored their efficiency and concluded that 
pseudo-likelihood estimators were preferable in the cases considered. 
Even estimates of the asymptotic covariance matrix of the parameters in 
S or C are not readily available, and most of the published estimates are 
given without standard errors. Two exceptions are Tjgstheim’s one-sided 
SAR schemes for which he gives the asymptotic theory {for Martin’s 
process, the asymptotic covariance matrix of (A, #) is n~! diag (1-»?,1- 
v*)], and coding estimators (at least conditionally). If the likelihood is 


Fig. 5.5 One coding scheme for the process 
of Figure 5.4(b). Sites marked x are taken as 
e x e x e x e the conditioning set. 
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maximized in part numerically some of the numerical second derivatives 
may be available. Although no general theory is available, there is a 
general belief that one will be developed paralleling that for Markov 
processes (Billingsley, 1961), which will show the usual relationship be- 
tween these derivatives and the asymptotic covariance matrix. Whittle 
(1954) and Ord (1975) give some results for Gaussian processes. 

Autoregressions have been fitted to the Mercer-Hall wheat yield data by 
Whittle (1954) and Besag (1974, 1977a). The periodicity noted in Section 
5.1 may explain why the fit is not at all good. Furthermore, the data are 
not observations at lattice points, but represent averages of a continuous 
surface of soil fertility with some added measurement error on each plot. 
Besag (1977a) fitted a covariance matrix of the form 


V=alt+B(I-C)' a, B>0 


corresponding to additive independent errors of variance a. Mead (1966, 
1967, 1968, 1971) fitted SAR processes to small competition experiments. 

Thus far we have confined attention to Gaussian processes. The condi- 
tional models can be viewed within the Gibbsian formulation of Section 
2.4. Take the base process as independent M(u,07) observations. Then 
the probability density of a CAR process ts 


|I—C|Pexp| (2-1 )'C(Z—H)| (5.28) 


If C,, #0 only if r and s are neighbors on the lattice, then (5.28) is of the 
form (2.10) and the CAR scheme is both Markov and pair potential. 
Note that the normalizing constant is the awkward determinant. 

This suggests a generalization by taking Markov pair potential processes 
with respect to other base processes. For binary variables Z, we find 


P(Z,=2,,1=1,...,.N)=Aexp{ > 0,2, + D8,;2:2;} (5.29) 
if; + 28,52; 
exp (2 (0 > 2) 


This is a logistic model for Z, with the other observations as explanatory 
variables; Besag (1974) terms this an autologistic model. It is the classic 


P(Z, =z,|rest) = 


AGRICULTURAL FIELD TRIALS 95 


Ising model of statistical physics, in which the values 0 and 1 correspond 
to “up” and “down” magnetism of atoms in a crystal. Besag fitted this to 
presence/absence in a grid of quadrats (see Section 6.5), but as Mead 
pointed out in the discussion of that paper the model does not seem 
ecologically enlightening. 

A further possibility is to take a count N, at each site and modify a base 
process of independent Poisson variates. A simple pair potential process 
is given by 


PN =n; i= Lanny n)=A exp | > [a,N,-In N!] + 2 A,NN,| 
ij 


(5.30) 


when the conditional distribution of N, given the rest of the counts is 
Poisson with mean a,+2,B,,N;. This Besag termed an auto-Poisson 
model. Considering P(N, =n;, N, =n,|rest) shows that A in (5.30) can be 
chosen to define a probability only if B,, <0 for alli, j. Thus an auto- 
Poisson process can only model competition within the population being 
counted. 

Spatial moving averages, with V instead of V~' parameterized in a 
simple way, are often mentioned but rarely used (Haining, 1978). 


5.3 AGRICULTURAL FIELD TRIALS 


One application of the spatial autoregressions discussed in Section 5.2 is to 
the field trials described in the introduction to this Chapter. We shall 
need only a few formulas from the previous section. Suppose that the p 
varieties have mean yields (@,,...,6,) and are distributed r times each 
according to some experimental design. Then if Z; is the yield on the ith 
plot 


E(Z;)=(D8),;  D’D=rl 


Let N be the matrix of 0’s and 1’s giving which pairs of plots are neighbors. 
Then 


Z=D0+n (5.31) 


and we expect the “errors” 7 to be correlated. Papadakis (1937), in a 
method discussed further by Bartlett (1938, 1978a,b) proposed to adjust 
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the variety means by an analysis of covariance involving the residuals from 
neighboring plots, that is, to fit 


Z=D0+aN| Z—D6"5|+e (5.32) 
where 645 are the usual least-squares estimates. The analysis of covari- 
ance chooses a and @ by least squares. This is an approximation to the 


more logical idea of fitting 


Z=D0+aN[Z-—D0)|+e 
Or 
[1-aN][Z-D6]=e 
or 


Z=D0+7; n=aNynte (5.33) 


which introduces an SAR model for the residuals. From the theory of 
Section 5.2 the maximum likelihood estimator is given by 


6=(D™BD)'D™BZ 
DB z—D6|=0 (5.34) 
For our SAR process, 
B=[I-aN]’=[1-2aN+a7NN], 
so 
r6=D'Z—&D™N[21-aN]| Z—D6| (5.35) 
The first term is the usual least-squares estimate. For small & (5.35) is 
similar to (5.37) apart from the factor of 2. However, if we consider a 
CAR model for y in (5.31) with C=BN we find 
r6=D™Z-$D™N| Z—D6| (5.36) 


which is closer in spirit to Papadakis’s estimator, for his estimate is given 
by 


r6” = D™Z~&D™N| Z—D6*5| (5.37) 
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Indeed, if a=£ were known, (5.37) can be regarded as a first step in the 
iterative solution of (5.36). Bartlett suggested the iteration of (5.37) and 
Martin in the discussion of Bartlett (1978b) showed that in one case the 
iterated solution of (5.37) converges to that of (5.36). It is perhaps not too 
surprising that a CAR appears when we note that least squares gives 
inconsistent estimates for an SAR process but is a reasonable (pseudo- 
likelihood) procedure for (5.31) with a CAR process of “errors” 7. For 
small a, an SAR scheme with a and a CAR scheme with B=a/2 give 
approximately the same covariance matrix. Since minimizing n~'\\(J— 
aN)Z|\* is consistent for the CAR scheme, least squares must give an 
estimate of approximately a/2 for the SAR scheme! If we consider 
iterating (5.37) for fixed & we find, with E=aN/r, 


6 =r~'D™Z— D'EZ + DTEDO 
=| 1+D"ED+ ve +(DTED)~? |" r-Y—E]z 


+(DTED)"6© 
which converges to 
6=[1-DT7ED|"'D™[r-Y-E|z 
or 
r6=D™Z-—a&D™N| Z—D6| 


which is (5.36) with =f. For the convergence to hold, we need the 
largest eigenvalue of D™ND to be less than r/&. The matrix D’ND gives 
for each pair of varieties the number of times they were adjacent. Re- 
cently R. J. Martin showed that this condition holds for & less than the 
reciprocal of the number of neighbors of each plot. 

Adjusting by aN[Z-— D8] is less attractive at the corner and edge plots 
that have fewer than four neighbors. It is not clear whether in practice it 
is better to adjust by the average of the (4, 3, or 2) residuals on neighboring 
plots or even to use diagonal neighbors at the corners. These suggestions 
correspond to alterations in the supposed covariance matrix of the “errors” 
nN. 

The reader is recommended to peruse Bartlett (1978b), especially the 
discussion, for details from both theoretical and practical points of view. 
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5.4 REGRESSION AND SPATIAL AUTOCORRELATION 


The other application of spatial autoregressions has been in quantitative 
geography. The observations are economic or population statistics on 
each of a set of contiguous regions. The methods depend on a weighting 
matrix W expressing the connections between the regions. This can be 
just a binary matrix giving | if the two regions have a common boundary, 
0 otherwise, or it could depend on the length of the common boundary, the 
distances between the regions or transport costs between them (in which 
case W might not be symmetric), Cliff and Ord (1973) and Haggett et al. 
(1977) discuss the choice of W at some length. Bartels (1979) suggests 
that simple binary weights have proved as adequate as more complex 
schemes. Even when W has to be estimated from other data it is regarded 
as fixed. 

The main thrust of research has been into investigating tests of no 
correlation among the observations or among residuals from a regression 
of the observations on explanatory variables. Indeed, the philosophy 
adopted seems to have been that if “spatial autocorrelation” is found more 
explanatory variables should be introduced until it disappears! 

For binary weights Moran (1950) and Geary (1954) introduced the 
following coefficients of autocorrelation 


1=| n> 6,(x,-x)(x,-¥)| / (2a, Be-x7| 


c=|(n- 1) > 6,,(3; -x)]/[2d8,)3 (x; -x,| 


respectively, where x,,...,.x, are the observations on 7 regions and 6,;=1 
if ix+7 and i andj are contiguous, 0 otherwise. Cliff and Ord generalized 
these definitions by replacing 6,; by W,;. These coefficients have been 
thought of as tests of p=0 in the SAR process 


X=p+ow(X—p)t+e (5.38) 


For joint Normal data we can find the Neyman-—Pearson test from (5.20) 
for p=p, >0 as alternative to p=0. This is based on 


[(x— A)" pW (x8) | / [(x- 8)" (x-3)] 


which for infinitesimal py reduces to a multiple of J. 
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To discuss asymptotic Normality of J or C we have to consider how our 
set of regions might be supposed to become infinite. A proof of asymp- 
totic Normality is given by Cliff and Ord (1973) and a more careful one by 
Sen (1976). The conclusion of these studies is that assuming Normality 
will not be a good approximation for sparsely connected regions such as 
those shown in Figure 5.6. A lot of effort has been spent in evaluating the 
theoretical moments of / and C both for independent Normal x,,..., x, 
and for the randomization of the given observations amongst the regions. 
We only consider independence. Cliff and Ord show that 


E(1)=—1/(n-1) 
E(17)=(n7S, —nS, +37)/{(n? —1)27} 
E(C)=1 
var(C)=[ (2S, + S,)(n—1)— 427] /{2(n+ IQ?) 


where Q= Zw, ; 
S,= : > (w,, Ww)” 
S,= > (w,.+w,)° 


where - denotes summation over that index. These formulas are not valid 
for the residuals from a regression. Brandsma and Ketellapper (1979) 
note that there is no need to subtract the mean from residuals, and 
consider 


Y=DB+y 


Fig. 5.6 A sparsely connected system for 
which spatial autocorrelation coefficients 
are not approximately Normal. Only re- 
gions joined by arrows are considered to be 
contiguous. 
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and test statistic 


T ~ 
z We u=y— DB (5.39) 
uu 


GMC= 


Then if B is fitted by least squares, u=My, where M=[I-— 
D(D™D)~'D"], and 


E(GMC) =tr{ MW} /(n—k) (5.40) 


RIGMC = (n—k)(n—k+2) 


(5.41) 


where there are n observations and k explanatory variables and where 
tr{ A} denotes the trace of a matrix A (the sum of its diagonal elements). 
The key step in the derivation of all these formulas is the Pitman— 
Koopmans lemma, which asserts that the quotient and denominator of 
each of J, C, and GMC are independent for independent Normal observa- 
tions. 

Testing for spatial autocorrelation among the residuals of a regression is 
a slight extension of the problem of testing for serial correlation in the 
residuals of an econometric regression, and the recursive and BLUS 
procedures of that field have been tried. Brandsma and Ketellapper 
(1979) report a simulation study which recommended the GMC test 
Statistics applied to the usual residuals, considered to be Normal with 
mean and variance given by (5.40) and (5.41). In particular, the full 
likelihood ratio test based on fitting 8 and p in 


Y=DB+n n=pWnte 


by maximum likelihood did not do well, it seems because of the inade- 
quacy of the x? approximation to the distribution of —2log(likelihood 
ratio). (A systematic bias in this approximation appears in all their 
examples and will seriously affect the power comparisons.) 

Cliff and Ord (1973) found a Normal approximation for both J and C to 
be adequate except for few regions (n< 10) or an unusual connections 
matrix W when the Monte Carlo tests of Section 2.5 should be used. 
They preferred / to C both in simulation studies and by showing that the 
asymptotic relative efficiency of C to / is not greater than one (as might be 
expected, since J approximates the Neyman- Pearson test), being 
2S, /(2S, +5, —4n). However, the difference is small with regular sys- 
tems of weights. Their Normality conclusions also apply to testing residu- 
als by GMC. 
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The earliest studies of spatial autocorrelation were for presence/absence 
data. Suppose that at each site there is a color, black or white. The null 
hypothesis is that black occurs independently at each site with probability 
p. For binary weights Moran (1948) and Krishna Iyer (1949) evaluated 
the moments of the numbers BB of black—black and BW of black-white 
joins. 

Lebart (1969) extended Geary’s coefficient to give a spatial correlogram, 
using a series of weight matrices expressing “contiguities of different 
orders.” Cliff et al. (1975) and Hodder and Orton (1976, pp. 179-183) 
give a correlogram in which the weights for each “lag” depend on the 
distances between regions. 

The cited works of Cliff and colleagues and Dacey (1968), Fisher (1971) 
and Hordijk (1974) give applications of spatial autocorrelation in geogra- 
phy. Jumars et al. (1977), Jumars (1978), and Sokal and Oden (1978) give 
biological examples and Hodder and Orton (1976) give some uses in 
archeology. 

A few attempts have been made to estimate the correlation structure and 
fit a linear regression model of the form 


Y= DB +n, n=pWnte, e~N(0, 07/) 


Bodson and Peeters (1975) fitted such a model by least squares, which as 
we saw in Section 5.2 is inconsistent. Ord (1975) and Hepple (1976) used 
maximum likelihood estimation of p and B simultaneously. The Cochran— 
Orcutt scheme used by Ord amounts to iterating (5.21), (5.22), and (5.23). 
Hepple (1979) gives a Bayesian approach (using the “natural conjugate 
prior” of p, loge? and 8 independent and uniform). The latter is very 
close in spirit to the approach of considering the whole likelihood function 
taken for example by Edwards (1972). Adrian Smith takes a different 
Bayesian view in the discussion of Bartlett (1978b). 

It is possible to extend all this theory to space-time processes. If we 
assume say 


Z(t) =pWZ(t—1) +e(2) 


where Z(t) is a vector indexed by the regions, we do not encounter any of 
the problems posed by the determinant in (5.20). Such models are consid- 
ered by Cliff and Ord (1975) and Cliff et al. (1975), and extensively by 
Bennett (1979). Even if W uses spatial structure these models are multi- 
variate time series and have little in common with the methods of this 
volume. 
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CHAPTER 6 


Quadrat Counts 


In this chapter we look at analyses of data from small subareas, called 
quadrats after the square frames used by ecologists to mark out such 
samples. Sampling is usually either random or systematic and exhaustive 
as illustrated in Figure 6.1. One way to attempt random sampling has 
been to throw quadrats over one’s shoulder; Greig-Smith (1964) reports 
studies that show that this is insufficiently chaotic, so the centers of the 
squares are usually found with reference to a table of random numbers. 
Within each quadrat the experimenter may count the population of plants 
or animals or take other measurements, such as the yield of a grassland or 
“cover.” The latter is an estimate of the abundance of a plant species by 
further sampling; a number of pins are placed in the quadrat and the 
proportion that touch a plant of that species is recorded. Sampling in 
plant ecology is discussed by Greig-Smith (1964) and Kershaw (1973) and 
for animals by Southwood (1978). 


6.1 INDICES 


Quadrat sampling may have one or both of two distinct aims. A measure 
of the abundance per unit area is given by the average count for the 
quadrat samples divided by the quadrat area. Random positioning 
of the quadrats allows us to take averages over the randomization. Then 
the count per unit area from the samples is an unbiased estimator of the 
population count divided by the total area for the region within which the 
samples are taken. We will assume from now on that the measurement in 
each quadrat is a count and define the intensity to be the population count 
per unit area. Of course, the variance of this intensity estimate depends 
on the spatial pattern of the individuals being counted. If they are 
assumed to follow a Poisson process of intensity A, the count within a 
quadrat of area A has a Poisson distribution with both mean and variance 
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Fig. 6.1 (a) 25 random quadrats. (b) 1616 grid of quadrats. 
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equal to AA. The unbiasedness of an intensity estimate from random 
quadrat sampling makes it a good method for estimating the total popula- 
tion size when it is feasible. Some practical problems and alternative 
methods are discussed in Chapter 7. 

The other aim is to investigate the pattern of the population under study. 
Many indices have been proposed based on counts (x,,...,x,,) from a set 
of quadrat samples. For Poisson-distributed counts, the variance /mean 
ratio is one. Fisher et al. (1922) and many others since have suggested 
considering the sample equivalent s?/x (where s? = D(x, —X)?/(n—1) and 
¥=Zx,/n as usual). David and Moore (1954) introduced their “index of 
clumping” 


ICS=s?/¥-1 (6.1) 


The idea of this index is that if the population is clustered, the index will 
be large, whereas if the individuals are regularly spaced the index will be 
negative. Furthermore, the sampling variability of the intensity estimate 
x will increase with ICS. The sampling distribution of JCS is unknown 
even for a Poisson process generating the population. For a significance 
test of the null hypothesis of a Poisson pattern s7(n— 1)/< is often referred 
to a X7,—1) distribution. 

The notation ICS and ICF=x/ICS was introduced by Douglas (1975). 
ICS is his index of cluster size, for if we imagine a population with exactly 
m individuals at each point of a Poisson pattern, we would find [CS 
(m— 1) as the sample size increased and s? and x approached their means 
m?)\A and mAA. If m is distributed as a Poisson random variable of 
mean p independently at each point JCS-»p. The index of cluster 
frequency JCF should measure the mean number of clusters per quadrat 
and so be proportional to the area A of the quadrat, whereas JCS is 
(asymptotically in n) independent of the quadrat size or shape. For 
similar reasons, Lloyd (1967) defined an “index of mean crowding” x* and 
“index of patchiness” JP by 


x*=X+(s?/¥-1)=¥+ICS 


IP=x*/¥=1/ICF+1 (6.2) 


Lloyd considered x* to represent the number of individuals sharing a 
quadrat with a typical individual, the two terms representing those in other 
clusters and those in the same cluster. If we look at the nx individuals in 


INDICES 105 
turn, we find the average number sharing the quadrat to be 
Dd x(x, -1)/nx=(1-1/n)(s?/¥+X)-1 


IP is found by rescaling by the intensity for a Poisson process. 
Douglas (1975) derived standard errors for [CS and JCF as follows: 


[ics-(#-1)}'-[ 2-2] =| eee) 2) 
H x oo u+(xX-n) BP 


se. 2 
07 +(s? ~0?)- 9? Fb) 0" : 


pepe 
ry 


so 
var( ICS) =p *var(s*) —207cov(s?, X)/pt+o4var( X)/p? 


The right-hand side can be expressed in terms of the first four moments of 
a count. These moments are then estimated by the moments of the 
observed counts. Douglas gives a computer program in APL to evaluate 
this and the comparable expression for ICF. 

An alternative to the patterns of clusters at points of a Poisson process 
used so far is to assume patches, larger than a quadrat, of different 
intensities. Morisita (1959) introduced 


I, =n > x,(x, —1)/ {nx (nx—- 1} =nx-IP/(nx— 1) (6.3) 


The idea is that J; (or JP) will measure the variability in intensity between 
the patches and so be little affected by the quadrat size. If x; has a 
Poisson distribution mean A,, the mean of J, converges for large nx to 
n=X /(ZA,)*, so 1, is a reasonable measure of the variability in \. 

Another justification used for these indices is to consider random 
thinning of the pattern, in which each individual survives (or is recognized) 
independently with probability 6. Then for large samples JCF is indepen- 
dent of 0, whereas JCS is reduced by the factor 6. (Proofs are given by 
Douglas, 1975 and Pielou, 1977, pp. 126-132.) It has been argued that 
random thinning only reduces the intensity but not the pattern features 
(Hill, 1973), but for our process of point clusters the clusters disappear for 
very small @, since the probability that two or more individuals will survive 
from m is approximately m(m—1)@?/2. Another counterexample is given 
by Ripley (1978, p. 971). 
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L. R. Taylor has for some years suggested another type of index for 
counts at distinct sites that could be applied to quadrats. He suggested 
that for a large number of data sets 


s? wax? (6.4) 


and that b was a characteristic index. Taylor et al. (1978) give further 
references and check the fit of (6.4) to a large number of examples. 
Clearly, (6.4) must be an approximation, for at low intensities of counts 


s?~ D> x? /n- xX? ~x- xX? 


when the counts are 0 or 1. Nevertheless, the index b does seem to be a 
useful summary of the data without any specific spatial interpretation. 


6.2 DISCRETE DISTRIBUTIONS 


The methods of the previous section are not really spatial in that they 
ignore the spatial pattern of the sample. As such, they can really only be 
recommended for exploratory work and for their simplicity. Neverthe- 
less, extensive research has gone into analyzing counts from random 
quadrats more closely and fitting parametric families of distributions to 
these counts. Rogers (1974) is a good reference at an elementary 
mathematical level. Douglas (1979) is more advanced. 

Our two schemes of Section 6.1 can be formalized. The point cluster 
model gives rise to generalized distributions with 


N=N,+-+-: +My 


the N,’s being independent random variables all with distribution P,, 
giving the number in each cluster, whereas M is an independent random 
variable with distribution P, giving the number of clusters. We refer to 
the distribution of N as P,\/P,._ The patches of different intensity model 
gives rise to a compound or mixed distribution P,/\P, in which the 
parameters of a discrete distribution P, representing the count variability 
are sampled from a distribution P, representing the variations in intensity 
found by the random quadrat positioning. 

Many distributions can be defined by one or both procedures. Com- 
mon examples are 


Negative binomial = Poisson\/log series = Poisson/\gamma 
Neyman type A = Poisson\/ Poisson = Poisson (Poisson 
Thomas = Poisson\/(Poisson + 1) 
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The index k of a negative binomial distribution has been suggested as an 
index in the sense of Section 6.1. The asymptotic value of JCF for a 
negative binomial distribution is k, and the methods of moments estimator 
of k found by equating x=E(N),s* =var(N) is ICF. The maximum 
likelihood estimate of &k can only be found by iteration or numerical 
maximization. 

Consider the two explanations of the negative binomial distribution. 
For the patches model, we have as asymptotic values 


ICF= gamma shape parameter 
ICS=A/p A-quadrat area 


p-gamma scale parameter 


whereas for the point cluster model for a Poisson process of intensity and 
log series parameter a 


ICS*a/(1—a) 
ICF= Aé/{ —log(1—a)}. 


Thus the variation in the two indices with quadrat size should discriminate 
between the two explanations. Unfortunately, both mechanisms could be 
present, and the mathematics is an unrealistic abstraction. To have any 
hope of understanding the spatial pattern several quadrat sizes will be 
needed. 

Two of the underlying assumptions of this method are particularly 
suspect. Unless the quadrats are widely spaced the counts are unlikely to 
be independent as is assumed in the usual methods of estimating parame- 
ters. Indeed, in the patches model we must have no more than one 
sample per patch. Hence these methods should not be used for the 
exhaustive division into quadrats shown in Figure 6.15. Second, it is 
unlikely that all the clusters will be small enough to be contained within a 
quadrat. Gleeson and Douglas (1975) explored in a simulation study the 
effect of realistically sized clusters when fitting the parameters of Neyman 
type A and Thomas distributions. (See also Pielou, 1957.) A related 
problem is to describe the association between two or more types of plants 
(different species or the same species in different years, for instance). 
Pielou (1977, Chapter 14) reduces the problem to a contingency table 
giving for types A and B the number of quadrats containing both A and B, 
only A, only B and neither. For r types an rXr table can be drawn up. 


108 QUADRAT COUNTS 


If this reduction in the data is acceptable and if the observations in 
different quadrats can be considered to be giving independent information 
then standard methods can be used to test for association in the table. 
Again, practically all the spatial information is ignored. 
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Greig-Smith (1952) suggested the systematic division of an area into arrays 
of N quadrats, usually 16X16, as illustrated in Figure 6.15. These basic 
quadrats can then be combined into larger rectangles as seen in Figure 6.2. 
Not only does blocking provide a sequence of quadrat samples of increas- 
ing area, it also uses the spatial arrangement of the data. It must be noted 
that the area of the quadrats and the spacing of the information considered 
are increased at the same rate and effects due to large quadrats and to 
large distances could be confounded. Kershaw (1957) proposed using a 
line of quadrats rather than a square grid. His modification has been 
widely accepted because for a given amount of effort it provides informa- 
tion on many more “scales of pattern.” The difficulties of using a 
one-dimensional sampling technique to explore a two-dimensional pattern 
are perhaps more serious than users of Kershaw’s method have suggested. 
The following discussion applies to either lines or grids. 

Greig-Smith’s analysis was to compute the sum of squares S, for each 
block area r and use these in a nested analysis of variance. Consider S, 
and S,,._ The blocks at size 2r are each divided into two counts (a,, b;) 
for i=1,...,m=N/2r. 


S=*D(q+87) 8, = Dla +b) 
! 1 


| 


m m 
S,-S,,= = D [2a? +26? -a? —b? -2a,b,)=3- S(a;—-,)” (6.5) 
1 | 


N 


r 


We call S, a sum of squares, not a mean square, as it is the reduction in 
sum of squares obtained by allowing a different mean in each of the 2m 
blocks of size r. In a nested analysis of variance the mean square 
M, =(1/m)(S, — S,,) is associated with allowing different means in each 
half of each 2r block. Equation (6.5) shows that M, is the average over 
blocks of size 2r of the squared differences between the counts in the two 
halves of the block divided by the block size 2r. It would therefore be 
more appropriate to label this mean square by 2r, but we will keep to the 
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Fig. 6.2 Blocking of 16 quadrats into 4x2, 2x2, 2x1, and 1X1 blocks. 


conventional terminology. Hill (1973) proposed a “two-term local vari- 
ance” in which all blocks of a given size and shape were used in the 
average for M,, not just the nonoverlapping set shown in Figure 6.2. 

It is usual to plot M, against r and to note peaks and troughs in the 
resulting plot. This is more satisfactory for a line than a grid, when 
alternate points refer to square and rectangular blocks. The usual F tests 
in the analysis of variance could be used, but typically M, will be inflated 
by clustering on the scale of a quadrat (which is biologically uninteresting). 
Thompson (1955) calculated expected mean squares for Neyman-Scott 
cluster processes (see Section 8.4); these he used in Thompson (1958) to 
explore which features of the plots would correspond to parameters in 
these models. The assumptions of Normality and independence underly- 
ing the F distribution are usually untenable and “significance” has been 
assessed by comparing plots from several transects and extracting their 
common features. It is important to note that the mean squares at larger 
block sizes are based on fewer blocks and so are more variable. _Intrinsi- 
cally there is less information at these block sizes and Hill’s modification 
cannot help. Usher (1969, 1975) and his student Errington (1973) have 
tested Greig-Smith’s analysis for various artificial patterns and have shown 
that the choice of the origin of a transect (and presumably also of a grid) 
can markedly affect the plots. A simple explanation is as follows. Sup- 
pose there is a soil fertility fluctuation of period about 27 times the quadrat 
size. Then for some starting positions a large mean square will appear at 
block size r and for some at r/2, as shown in Figure 6.3. 

Moellering and Tobler (1972) used a similar analysis for a geographical 
example. Ecological applications are given by Allen (1973), Anderson 
(1961, 1967, 1971), Austin (1968), Cooper (1960), Greig-Smith (1961, 1964), 
Greig-Smith and Chadwick (1965), Hall (1971), Hill (1973), Kershaw 
(1957, 1958, 1959, 1960, 1961, 1973), Morton (1974), Owen and Harberd 
(1970), Pemadasa et al. (1974), Riechert (1974), Riechert et al. (1973), 
Usher (1975), Walker et al. (1972), Westman (1975), Westman and Ander- 
son (1970), and Williamson (1975). Greig-Smith (1979) gives a recent 
survey. 

The data are counts so we could replace the analysis of variance by a 
series of likelihood ratio tests for independent Poisson counts. This 
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Fig. 6.3 The effect of the starting point of a transect, where (a) has a large mean square at 
block size r and (b) at size r/2. 


analysis would correspond to a Poisson pattern allowed different intensi- 
ties within the blocks. Nelder and Wedderburn (1972) term this an 
analysis of deviance. Let »; be the mean for the jth quadrat, and m, =In p,. 
Then the log likelihood of counts Z,,..., Z, is 


ini 31 pF Zing (ZO) (6.6) 
i 


If we consider blocks of size r within blocks of 2r we have the test statistic 
T,, which in the notation at (6.4) is 


T,= ) (a, +b,)In{(a; +b;)/2r} —a,in{a;/r} —bln{b,/r} 
1 


=L,, -1,-(3z,) in (6.7) 


i 


where L, is the sum of Z1n Z for the total count in each block of size r. 
From standard maximum likelihood theory the distribution of —2T, is 
(independent) Xin /2ry asymptotically for large N. Orloci (1971) also de- 
rived (6.7), but from the standpoint of minimum discrimination informa- 
tion. He considered confidence bands based on randomizing the spatial 
positions of the N quadrat counts. 

These Poisson likelihood ratio tests can be computed (rather ineffi- 
ciently) by the program GLIM (Baker and Nelder, 1977), which can also be 
used to test other hypotheses. For instance, it could fit a trend surface 
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model for m, to reveal or remove broad trends in the counts. The 
assumption of a Poisson distribution for the counts is probably better than 
a Normal distribution, but the possible presence of clusters suggests that a 
generalized distribution would be better. GLIM can be used to assume 
only that the variance of the distribution is proportional to the mean, 
which is appropriate for such models. For the model with different 
means in blocks (but not for trend surface models) the effect is to multiply 
the deviance by an unknown constant. Ratios of the deviances again give 
F tests which will be more appropriate than those under Normality 
assumptions. Some examples are discussed in the next two sections. 

Mead (1974) proposed tests based on randomizing the positions of the 
quadrats in a transect. Consider blocks in sets of four. For each four 
consider all possible differences between the totals of two pairs from the 
four. There are three such pairs. The observed difference between the 
sums of the first and second pair is compared with the randomization 
distribution. An example may be helpful. Counts 


oe 0220 001 10 11102 594 10 
e (0,0,4) — (11,9,9) (108,12) (0, 10,2) 


for the differences between the halves, in each case the first of the three 
being the observed difference and the other two equally likely under 
randomization. The sum S is 21; under randomization P(S <21!)=28/81 
and P(S> 21) is 67/81. A “distribution-free” form of the test is formed 
by replacing triples of differences of the form 


(a,a,b) by (0,0,2) a<b<c (6.8) 
(a, b,c) by (0,1,2) 
(a, b,b) by (0,2,2) 


If the counts are blocks of size r, this test is of the difference between the 
halves of a block of size 4r and so is labeled by 2r. The virtue of this 
procedure is that the tests made at sizes 2,...,N/4 are independent. 
Although Mead did not do so, each of the ¢ tests should be performed at 
level 1—(1—a)'/"~a/t to obtain an overall significance level of a. The 
exact permutation distribution can be found for transects of up to 1024 
quadrats by careful programming (to avoid overflows). For small block 
sizes it may be convenient to use a Normal approximation to the distribu- 
tion of S. For the “distribution-free” version this is 


= S—(2n, +3n,+4n;)/3 
V {8(1, +13)+6n,}/3 
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where there are n,, n,, and n, triples of the three types in (6.8). A 
two-dimensional analogue is available and is considered briefly by Mead 
(1974, p. 306) and Besag and Diggle (1977). An exact analogue can be 
applied to alternate horizontal and vertical pairings or only square blocks 
considered, so that four blocks of size 4r are considered within a 16r block. 
Any statistic that compared the four subblocks would have a permutation 
distribution over (16!)/(4!)° ~2.6 million permutations. Besag and Diggle 
suggested using random permutations to explore this distribution. One 
possibility is to use these random permutations to estimate the mean and 
variance of the statistics for each 16r block and refer the sum over the 
blocks to a Normal distribution. 

Zahl (1974, 1977) looked for a method to give a simultaneous test 
statistic of all sizes in a Greig-Smith type analysis with Normally distrib- 
uted “counts.” His approach for a grid (based on Scheffé’s S method) is 
complex, needs the inversion of large matrices and has no analogue for a 
transect. His alternative hypothesis is a “cluster,” a block of size rXc of 
higher or lower intensity than the rest of the pattern. His simulation 
study shows disappointing power even against this unrealistic alternative. 


6.4 ONE-DIMENSIONAL EXAMPLES 


An obvious approach to the analysis of data from a transect is to regard it 
as a time series, remembering that the direction of “time” has no meaning. 
Spectral analysis has been considered by Hill (1973), Usher (1975) and 
Ripley (1978) for transects of counts. Unlike all the blocking methods 
except Hill’s, it is not seriously affected by the starting point of the 
transect. In this section we illustrate the Hill, Greig-Smith, Mead, Pois- 
son likelihood ratio test (P/rt) and spectral methods. The examples are 
taken from Ripley (1978). The values of —2T7, for the Pirt analysis are 
given in Table 6.1. 

The first example was a simulation of a random pattern of counts— 
independent Poisson observations all with mean one. The results are 
shown in Figure 6.4. The peaks in the Greig-Smith and Hill analyses at 
about block size 32 must, of course, be due to chance. (Remember the 
variability on those plots increases from left to right.) All the values of 
—2T, and Mead’s test are fairly typical values for their assumed distribu- 
tions. The cumulative periodogram should be within the central band 
with probability 0.95. The spectral density should be approximately con- 
stant. The sampling fluctuation expected at each ordinate is indicated by 
the double-headed arrow. The equivalent block sizes shown are one-half 
the wavelength. 
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Table 6.1 Values of — 27, for the Data of Figures 6.4-6.87 


Figure 
Block Size af 6.4 6.5 6.6 6.7 6.8 
1 64 79.4 62.8 100.5 35.6 199.5 (256) 
2 32 32.3 82.0 117.7 32.8 200.2 (128) 
4 16 17.3 68.3 122.1 55.1 129.6 (64) 
8 8 7.10 6.29 121.7 56.7 80.9 (32) 
16 4 6.00 6.50 5.28 26.3 77.4 (16) 
32 2 5.02 4.44 7.26 24.9 59.9 (8) 
64 1 0.67 1.08 0.87 8.52 21.6 (4) 
128 — — _— — — 2.42 (2) 
256 — _— — — — 44.8 (I) 


*(Except Figure 6.6, which is another simulation of the same process.) The 
nominal degrees of freedom given are for the first four columns; those for the last 
column are given in parentheses. 


Figure 6.5f illustrates a simulation of a random pattern on an area with 
varying fertility as found in the Mercer-Hall data in Chapter 5. The mean 
of the Poisson count is (1+sincx), where the wavelength 27/c is 10 
quadrat sides. Mead’s analysis gave a significant result at size 2 and a 
nearly significant one at size 8. Hill’s analysis has a peak at block size 5, 
and most significantly, a trough at block size 10. These are reflected to a 
lesser degree in Greig-Smith’s plot at sizes 4 and 8. The spectral analysis 
clearly picks out the periodicity in “fertility.” The Pirt analysis gives 
extremely significant results at block sizes 2 and 4. 

Figure 6.6f shows a pattern of clumps which was sampled by a line 
transect. The clump diameter is about 4.5 and the mean spacing between 
clumps about 10, in units of a quadrat side. Neither Hill’s nor Greig- 
Smith’s analyses pick out these features although on another simulation 
both showed peaks at block size 8. Mead’s analysis gave a significant 
result at block size 2 (and at 4, 8, and 16 on the second simulation). The 
spectral analysis indicates a significant domination by low frequencies and 
has a peak at equivalent block size 10. For the Pirt analysis the result at 
block size I is extremely significant. We take this to indicate a gener- 
alized distribution for the counts and look at ratios of —2T7,. For this 
analysis on the second simulation block sizes 2, 4, and 8 were individually 
significant at 1%. The second simulation gave better results than the first, 
but none of the analyses clearly picks out the features in the original 
pattern that are obvious by eye. 
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Figures 6.7 and 6.8 show the analyses of two sets of real data. The first 
is one of eight transects taken by Kershaw (1957) and reanalyzed by Mead 
(1974) and Douglas (1975). The observations were counts on a score of 0 
to 5 of cover of Nardus stricta, a grass of moorland. The Mead analysis 
indicates a high value at block size 4, but only at 10% overall significance; 
the analysis of all eight transects has both block sizes 2 and 4 significant at 
5%. The spectral analysis shows a dominance by low frequencies and a 
peak at about equivalent block size 8, which is maintained in the analysis 
of all eight transects. The Pi/rt has a low value at block size 1, perhaps 
revealing the non-Poisson character of the observations. The value at size 
2 is typical of a x2, distribution whereas sizes 4, 8, 16, 32, and 64 all give 
extremely high values, comparably so according to F tests. This suggests 
significant effects at sizes 1, 2, and 4; this analysis is probably detecting the 
structure of the nonempty quadrats. 

Our final analysis is a transect of 512 quadrats measuring cover of Lotus 
corniculatus, Bird’s-foot Trefoil. Only block size 2 of the Mead values is 
significant at an overall 5% level, although the value at 32 comes close. 
The spectral analysis suggests merely an overall decrease of power with 
frequency, so counts in neighboring quadrats are quite highly correlated. 
Both the Greig-Smith and Pirt analyses suggest that the two halves of the 
transect are quite different in character, and have particularly high values 
at block size 32. 

These analyses are rather disappointing but spectral analysis seems the 
most reliable. 


6.5 TWO-DIMENSIONAL EXAMPLES 


We reanalyze four of the examples of Zahl (1974, 1977) to compare 
Greig-Smith, P/rt and spectral analysis with his results. The first two 
examples illustrated in Figures 6.9 and 6.10 are from Thompson (1958) and 
are counts in ]-meter squares. The others shown in Figures 6.1! and 6.12 
are counts in 3X2-meter rectangles. Zahl’s conclusions from his own 
method were patches of sizes 2X2 in Figure 6.9, 6X6 in Figure 6.10, 22 
in Figure 6.11, and about 10 10 in Figure 6.12. All four examples are on 
a 16X16 grid. The spectral analyses reveal a smooth isotropic spectral 
density that decreases steadily with frequency and isotropic locally positive 
correlations, except in Figure 6.11, which shows a marked anisotropy and 
a significant peak in the spectral density at a frequency (67 /8,0), a period 
of 2.7 meters in the x direction. 
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Fig. 6.9 Thompson (1958) sample A2 of bush clover. (a) Data. (b) Correlogram. (c) 
Periodogram. (d) Smoothed spectral density on log; scale with 95% confidence interval 
range of 0.38. Other details as Figure 5.1. 
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Fig. 6.10 Thompson (1958) sample A3. Details as 6.9. 


(d) 
Fig. 6.10 (continued ) 
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Creosote bush sample ARIZ1 from Zahl (1974). Details as 6.9. 


Fig. 6.11 
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(d) 
Fig. 6.11 (continued) 
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Creosote bush sample ARIZ4 from Zahl (1974). Details as 6.9. 


Fig. 6.12 
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(d) 
Fig. 6.12 (continued) 
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Table 6.2 Mean Values for Square Blocks in Greig-Smith’s 
Analysis for the Data of Figures 6.9a-6.12a 


Figure 
Block Size 6.9 6.10 6.11 6.12 
8x8 9.10 3.77 1.27 105.0 
4x4 0.77 6.45 1.47 11.2 
2x2 0.61 0.72 0.78 4.39 
1x1 0.50 0.64 0.66 2.73 


Table 6.2 gives the mean squares for square blocks. For the first 
example the large mean square at 8 <8 is due to the division into quarters, 
not into either 8 X 16 or 16 X8 blocks, as should be clear from Figure 6.9a. 
For the fourth example the large reduction at 8X8 is due entirely to the 
division into the front and rear halves. This example shows signs of 
heterogeneity both here and in the large power at low frequencies in the 
spectral analysis. 

Table 6.3 gives the deviances for the Pirt analysis, expressed as standard 
Normal deviates. Both the first and third examples fit the hypothesis of a 
random pattern giving independent Poisson counts with the same mean. 
However, the improvement in fit when the four 8x8 blocks are allowed 


Table 6.3 Values of \/(—47,)— \/(2v—1) for Pirt Analyses® 


Figure 
Block Size 6.9 6.10 6.11 6.12 
16x 16 1.69 4.16 1.17 8.38 
8x 16 1.02 4.79 1.22 8.41 
16x8 0.74 4.12 1.07 4.55 
8x8 —1.97 3.57 1.07 3.98 
4x8 —2.02 1.60 0.99 3.60 
8x4 —2.48 1.86 0.55 3.54 
4x4 — 2.48 —0.70 0.47 2.66 
2x4 —2.74 — 0.86 0.18 2.83 


4x2 — 3.68 — 1.34 0.50 1.74 


*y is the number of degrees of freedom. Variables given have 
asymptotically independent standard Norma! distributions. 
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different means in 6.9 is extremely significant. Other significant improve- 
ments in fit are 4x4 blocks in the second example, 8 x 4 blocks in the third 
and the splits into halves and quarters in the fourth. (In this example a 
generalized distribution was assumed and F ratios were used.) The peri- 
odicity found by spectral analysis is unlikely to show up in these analyses 
but is perhaps reflected in the significance of the 84 block size. Zahl’s 
own results are rather different. (Note that revised conclusions for the 
Creosote Bush examples are given in the 1977 paper.) 


We noted in Section 5.2 that Besag (1974) fitted an autologistic model to 
a grid of counts. 
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CHAPTER 7 


Field Methods for 
Point Patterns 


The quadrat counts discussed in Chapter 6 are one class of methods which 
can be used to census a collection of plants or to investigate its pattern. 
Quadrats are reported to be difficult to use in forestry and “distance 
methods” have been used for decades by foresters. Their basis is the idea 
that if the forest is dense distances measured from a point or tree to the 
nearest tree will be small, so measuring such distances should allow an 
estimate of the number of trees per unit area to be made. However, it was 
found that the estimator should depend on the pattern of the trees. One 
line of research has been to find robust estimators which are little affected 
by the pattern. The other was to take two estimators and compare them. 
If they react differently to certain aspects of the pattern the companson 
should give a measure of the pattern. Usually this is phrased as “testing 
for randomness,” which means testing the null hypothesis of a forest 
generated by a Poisson process. The class of possible processes (which we 
consider in more detail in Section 8.4) is so large that a single test statistic 
gives insufficient information about alternative patterns. The methods of 
this chapter are best suited to preliminary investigations. A detailed study 
needs a map of the population and the methods of Chapter 8. 

Distance methods have not been totally satisfactory in forestry, so 
Section 7.2 discusses the alternatives used. The preferred method is based 
on searching not for the nearest tree, but for all trees out to a fixed 
distance. These methods are also unsuitable for mobile animals, which 
are often censused from a line transect through the population. Distances 
are measured from the line to the animal’s location when it is spotted or 
flushed. Section 7.3 presents the theory behind this method. 
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7.1. DISTANCE METHODS 


To reduce the load on the word “point,” we will suppose our point pattern 
to be composed of trees. Figure 1.la is an example. The measurements 
made are of two basic types—distances from a sample point to a tree or 
from a tree to a tree. Sometimes “distance methods” and “nearest 
neighbor methods” are used to distinguish the two types, but we will 
regard these terms as synonymous. Figure 7.1 illustrates the measure- 
ments that have been suggested. Figure 7.1a, b shows the two simplest, 
from a randomly chosen point and tree to the nth nearest (distinct) tree. 
It is natural to work with squared distances, for 7 Xdistance? is the area 
searched in concentric circles from the sample point or tree until the nth 
nearest tree is found. To randomly select a tree it is usually thought 
necessary to enumerate all the trees in the study area and pick one from a 
table of random numbers. This defeats the whole aim of distance meth- 
ods so such measurements have not been used in practice and the last two 
suggestions are alternatives. In Figure 7.1c the sample tree is taken as 
that nearest to the sample point. This is not a tree chosen at random 
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Fig. 7.1 Types of distance measurement. @ denotes a tree, < a sample point. (a) Point to 
nearest tree. (b) Tree to nearest tree. (c) Tree A to nearest tree, where A is the nearest tree to 
the sample point. (¢@) T-square sampling. 
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a Fig. 7.2 A clustered pattern. The center tree of a cluster will be 
oe. chosen by the scheme of Figure 7.1c much less often than will 
. those on the outside. 


(Figure 7.2 shows that some trees will be chosen more often than others), 
and the two measurements are not independent. T-square sampling 
(Besag and Gleaves, 1973) allows a search only over a half-plane to ensure 
that the areas searched in the two measures cannot overlap. 


Distribution Theory 


The distribution theory for these measurements is simple only for a 
homogeneous Poisson process. Suppose we have a Poisson forest of 
intensity A. If u, =7d? for the distance d from any fixed point to the nth 
nearest tree 


P(u, >t?) =P(no more than n— | trees in a ball of radius ¢) 


n-1 


= 5) exp{—Amt?}(Amt?)’/r! (7.1) 
0 


so u, has an exponential distribution rate A. Furthermore, since the areas 
searched u,,u,—t,,u,;—U,..., are disjoint, these are independent ex- 
ponentials each of rate A. Hence u, has a gamma (A, 7) distribution, as 
can be shown directly from (7.1). These results hold for any fixed sample 
point and hence also for a random sample point chosen independently of 
the process. 

The same distribution theory holds asymptotically for measurements 
from a randomly chosen tree. Suppose we have N trees within the region 
D from which the tree was chosen. Let us argue conditionally on N and 
consider the sample tree separately; the other N—1 trees are independently 
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and uniformly distributed within D (Section 2.3). Then if A =area (D) 
P(no trees in an area a|N> 1)=E[ (1 ~a/A)"""|N> 1| 


={e~*—e 4} /{(1-e-*4)(1-a/A)} 
(7.2) 


We must condition on N>1 to have at least one tree from which to 
choose. We can approximate (7.2) by e~**, provided a«A and E(N)= 
AA is large enough (say >10). These conditions will be met in practice 
and lead to the same distribution theory for u, as before. 

The distribution theory of the measurements in Figure 7.1c is not simple 
and has been worked out for a homogeneous Poisson forest by T. F. Cox 
and T. Lewis (1976). They find a function R of the two measurements, 
which is uniformly distributed. However, the modification to 7-square 
sampling does have a simple distribution theory. Because the areas 
searched are disjoint they have independent exponential distributions with 
rate A. 

We noted that selecting a random tree was impracticable. Byth and 
Ripley (1980) introduced a scheme to select a tree with a modest amount 
of counting. Suppose D is divided into regions E and F= D\E and that 
the random tree is chosen from those within E. The Poisson processes on 
E and F are independent. By the arguments used above the probability 
that the disk C of radius ¢ centered on a random tree in E does not contain 
further trees is 


P(C is empty|N(E) > 1)=e~**(e~** —e-*) /{(1-e-*)(1-b/e)} 


where a=area (C\E), b=area (CONE), e=area (E). If we have b<e, 
Ab<he, then 


P(C is empty| N( E) > 1)~exp{ —Aat?} (7.3) 


from which we again deduce an exponential distribution for the area 
searched to the nearest tree. Suppose we want m samples of random 
tree-to-tree distances. Choose E large enough to contain 5m trees but 
divided into m widely separated small regions. Then b=e/m and A(e—b) 
svAex5m; so (7.3) holds. 

In deriving (7.1) and (7.3) we have assumed that we could, if necessary, 
measure distances from points within the study region D to trees outside 
D. To avoid this we usually select a random point or tree from a 
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Fig. 7.3 The recommended sampling plan. Point-to-tree distances are measured from crosses, 
random trees chosen without replacement from those within the circles. 


subregion Dy away from the boundary of D. A further constraint is 
imposed if we wish to take more than one sample. To be able to regard 
these as independent the probability that the areas searched might have 
overlapped must be negligible. This is most easily achieved by taking a 
regular grid of sample points. The recommended sampling plan is shown 
in Figure 7.3. For measurements as in Figure 7.la,c or 7.1d start from 
the regular grid. For both point—tree and tree—tree measurements use 
some grid points as sample points and the remainder to center small areas 
within which we enumerate the trees and then draw them at random 
(without replacement). A regular coverage should give the most accurate 
estimate of the average behavior over the region D by analogy with the 
theory of Chapter 3. 


Intensity Estimation 
Let d),...,d,, be the distances from m sample points to their nth nearest 
trees. For a homogeneous Poisson process, the maximum likelihood 
estimator of A (based on dj,..., d,,) is 

A=mn/2>.d? (7.4) 
This is not unbiased, but 6=1/X estimating the area per tree 0 is unbiased. 


Note that @ is just the average area searched per tree found. First 
moments have also been used. For n=1, the estimator 


h=1/(2D04,/my 
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has been proposed, because E(d,)=1/2VA. Both A and X are consistent 
as m—>0o for a Poisson forest, but A is marginally more efficient (Kendall 
and Moran, 1963, p. 38). Unfortunately, the constants 7/n and 2 are only 
appropriate for a Poisson forest. Many researchers have evaluated the 
appropriate constants for triangular and rectangular lattices and for dis- 
tances measured only in some angular sector from the sample point. 
Holgate (1972), Persson (1964, 1965, 1971), and Cox (1972) provide some 
recent surveys and comparisons. None of these estimators is particularly 
robust. 

The other approach has been to combine estimators, taken by Diggle 
(1975, 1977a), Lewis (1975), Cox (1976), and Aherne and Diggle (1978). 
For a regular pattern distances from points to trees and from trees to trees 
will have, respectively, smaller and larger means than for a Poisson forest 
of the same intensity. The reverse is true if the trees are clustered. 
Clearly, some combination of intensity estimators based on the two types 
of measurement might be more robust than one based on just one type. 
Let u; and v, be the areas swept out by each of m point—tree and tree—tree 
measurements. Diggle recommended as a result of a mainly simulation 
study 


“3 m Pe m 
A,=zm/ du, A,=m >d 2; (7.5) 
1 1 


Aherne and Diggle recommend using (7.5) if the tests of randomness 
discussed later reject a Poisson forest; otherwise (7.4) applied to all 2m 
squared distances (the harmonic mean of A, and A ,). An intuitive reason 
for using the geometric mean is that for ‘clustered patterns A, tends to 
measure the intensity of clusters and i, the numbers within clusters, so a 
multiplicative correction may be appropriate. Diggle’s conclusions are 
based mainly on biases of estimates of 92=1/A. For usual values of m the 
sampling fluctuations of A, and A , about their means will be small, so the 
biases in @ and A are closely related and good measures of robustness. 
The tree—tree measurements can be made by random selection of trees, by 
the Byth-Ripley scheme or by T-square sampling. Diggle preferred T 
square for its practicality. If the forest has a markedly varying intensity a 
systematic sampling plan will be beneficial. 

Batcheler and co-workers have taken yet another approach (see the 
references at the end of this section). 
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Tests of Randomness 


Many tests of the null hypothesis of a Poisson forest based on nearest- 
neighbor measurements have been proposed. Three recent comparative 
studies are given by Diggle et al. (1976), Hines and Hines (1979), and Byth 
and Ripley (1980). Let u,, v;, ut;, ot; be the areas swept out in point-—tree, 
tree—tree, and in point-tree-tree 7-square sampling, and u‘") the areas 
swept out in finding the nth nearest tree to a point. All these measure- 
ments are 7 times a squared distance except vf;, for which the multiplier is 
a/2 (since only a semicircle is searched). Some of these tests are: 


Skellam-Moore 
m 
n>. u, /area(D) 
1 


where D contains n points. For a Poisson forest this has a x(q) distribu- 
tion. Knowledge of 7 is unlikely and Pielou (1959; see also Mountford, 


1961) suggested replacing n/area (D) by a quadrat count estimate of the 
intensity. 


Hopkins 


m m 
Hop, = du; | 3a 
1 1 


Hopy = 2 {ui/(u;+%4)} 


were proposed by Hopkins (1952) and Byth and Ripley (1980), respec- 
tively, having F(2m,2m) and N(3,1/12m) null distributions (in fact, the 
latter is a very good approximation to the average of m U(0, 1) variables). 
In Hop, the pairing of u; and »v; is arbitrary. 


Holgate 


Hol, = Sa / : (u? —u;) 


Holy =>, {u,; /u} 
1 
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proposed by Holgate (1965a), which have the same null distributions as 
Hop, and Hopy. 


T-square 
tp= Dut, / Dd vt; 
ty = > (ut; /(ut; +0t;)} 


tp =2m D (ut, +01,)/{ Dd (vut; + vot,)}" 


The first two were introduced by Besag and Gleaves (1973) and have the 
same null distributions as their Hop and Hol counterparts. {, is a modifi- 
cation by Hines and Hines (1979) of the statistic of Eberhardt (1967) and 
can be viewed as a test of the exponential distribution of ut and vt. Hines 
and Hines give a table of percentage points for the null distribution of f,. 

Cox and Lewis (1976) considered the average and minimum of their 
statistics R;. Hines and Hines showed that the average is very similar to 
ty. Both Pollard (1971) and Diggle (1977) used Bartlett’s test for homo- 
geneity of variances, recognizing that the areas searched have x2» /r 
distributions, so testing the equality of A for the observations provides a 
test for the homogeneity of a Poisson forest. 

The comparison studies show that Holgate’s tests are not competitive 
with their Hopkins and 7-square analogues. Among T-square tests, t, 
seems the best overall, being comparable to Hop, for clustered alternatives, 
but worse than Hop, at detecting regular forests. Hop; and Hopy lose 
power only slightly when the sampling plan of Figure 7.3 is used, which 
needs only a small amount of enumeration and makes them practicable. 
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7.22 FORESTRY ESTIMATORS 


The tessellations and triangulations introduced in Section 4.3 can be used 
to give estimates of the intensity A of a forest. For the Dirichlet cells the 
average area is A/N for N trees in a region D of area A, simply because 
the N cells partition D. If we ignore edge effects this is also true of the 
about 2 Delaunay triangles (Fraser and van den Driessche, 1972). Thus 
if we could select a random cell or triangle and measure its area we would 
have an unbiased estimator of the area per tree. All we can do, however, 
is to select a random point within D and measure the area of the cell (or 
triangle) containing that point. Let the areas be a,,...,a,y. Then the 
probability of selecting the ith cell is a,/A, and the expectation of 1/a; is 
2(1/a;)a;/A=N/A. Thus 1/area for a cell selected in this way is an 
unbiased estimator of A for any homogeneous pattern. For cells this idea 
is due to Ord (1978). Finding the Dirichlet cell or Delaunay triangle 
containing a point is a local procedure. (The nearest tree defines the 
Dirichlet cell and is one of the vertices of the Delaunay triangle.) How- 
ever, constructing them in the field and measuring their area is clearly 
difficult. 

Foresters have found laying out quadrats and actually counting accu- 
rately difficult in dense forests. Distance methods provide insufficiently 
robust estimators and the ideas (proposed by foresters) described at the 
beginning of this section are too intricate. They are usually most inter- 
ested in the volume of timber in a tract of forest and use the Bitterlich 
(1948) method. This is discussed by Grosenbaugh (1952a,b), Shanks 
(1954), Kendall and Moran (1967, p. 47), Holgate (1967), and Ord (1978). 
For trees of circular cross section and equal diameter p it essentially uses a 
circular quadrat and so gives an unbiased estimate. The observer has an 
instrument known as a relascope, which enables him to decide whether the 
angle subtended by a tree trunk is greater than 2a. From Figure 7.4 we 
see that this is so if the tree is nearer than p/(2sina), so the observer 
counts those trees in a circle of this radius, and estimates A by number x 
Asin’a/7p’. If the trees are of different or noncircular cross sections their 
apparent diameters p, can be measured and 


n 
>D 4sin*a/mp? 
1 


formed. In either case, 4nsin’a is an unbiased estimate of the basal area 
per unit area of forest, and when multiplied by the area of the tract and an 
average useable height of a trunk gives an estimate of the timber volume. 
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Fig. 7.4 The geometry of Bitterlich sampling. 


Obviously, a should be taken as small as possible to include many trees. 
However, nearby trees might obscure distant trees and Holgate points out 
that errors in a become increasingly important at small a. A compromise 
of ax2° seems satisfactory. Of course, the variance of the estimate 
depends on the pattern. In practice, the counting will be repeated at 
several observation points and the sample variance can be used to estimate 
the sampling errors. 
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Mobile anima! populations provide rather different problems of censusing 
to those of forests. Two common examples are counting deer and game 
birds, both of which will usually flee as an observer approaches. This 
flight mechanism is used in the line-transect method. An observer walks 
a long transect through the area to be censused and whenever an animal is 
detected (usually by being flushed) he marks the spot and measures either 
the distance d from his present position or the right-angle distance x to the 
animal from the transect line (see Figure 7.5). During the course of a 
transect of length L he spots n animals and records (x,,...,x,) OF 
(d,,...,d,). If all animals were detected out to distance W and those 
beyond ignored, we would have a strip transect and the obvious estimator 
of A, the number of animals per unit area, is 


A=n/2LW (7.6) 


This is effectively a quadrat estimator, using a Lx2W rectangular area. 
The line transect methods find an equivalent of the half-width W. Many 
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Fig. 7.5 Measurements made in line transect sampling. The observer spots an animal at B 
when at 0. 


slightly different sets of assumptions have been proposed. We certainly 
need: 


1. Animals only move after detection. 

2. The detections of animals are independent and no animal is de- 
tected more than once. 

3. The detection probability is a function g(x) of the right-angle 
distance x from the transect, and g(0)=1. 


Unless detection of animals on the transect is sure we will never know 
what the missed proportion of animals is. 

Some other assumption on the pattern of the animals is required. They 
can be assumed to follow a Poisson or binomial process or the pattern can 
be considered fixed and the transect positioning random. 

Suppose for the moment that g(x)=0 for x>W. Then if X is a 
right-angle distance 


P(X <x| an animal is observed) 


= P(X <x, animal observed) / P(animal observed) 
={fenasw}/{ fsoo/¥| 


Thus the appropriate pdf of X when animals are uniformly distributed in a 
strip width 2W is 


Ax)=a(x)/u um f "e(y) dy (7.7) 


We may now let Woo. Then from (7.7) the observed right-angle dis- 
tances (X,,..., X,) are independent with pdf g(x)/p, conditional on n. 
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However, for a binomial or Poisson process 


E(n)=2ALp=2AL/f(0) 
so 


=nf(0)/2L (7.8) 


is an unbiased estimator of A. Of course, (7.6) is just the special case 
g(x)=1 for 0<x< W, g(x)=0 for x>W. 

Note that g differs from a pdf by the scale factor » which we must 
determine, and that to use (7.8) we must estimate f(0) for a pdf f, from 
which we have n observations (x,,...,x,). Both parametric and nonpara- 
metric methods have been nropased to estimate f(0). Suppose that we 
have an approximately unbiased estimator fO) and nvar( FO) |n)xwo?. 
Then for a Poisson pattern 


E(A)xA 
var(A)~| F(0) var(n) + E{ n?var( f(0)|n)} /eLy 
wA(1+y707)/2nL 


Such theoretical variance formulas depend rather heavily on the assump- 
tions, so the variability of an estimator of A is probably better assessed by 
the sample variance of estimators from different transects. 

For parametric estimation it is common to take either g(x) as the tail 
area for a family of pdfs or f(x) as one of a family of pdfs. Forms used 
are: 


g(x)=e7™, f(x)=ye"™ Gates et al. (1968) 
a(x)= 1=(x/W)"  0<x<W Eberhardt (1968) 
0 x >Ww 
g(x)=exp{ —(x/B)* Pollock (1978) 
Weibull tail area Ramsey (1979) 
oo 
a(x)= f Byte -P dy Sen et al. (1974, 1978) 


g(x) =exp{ —x?/207} 


fx) = v (2/n0?) exp{ —x2/207} 
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Except for the exponential g and half-normal f, the corresponding families 
for f are not quite the usual probability densities and methods of maximum 
likelihood estimation have been developed in the papers cited. These 
yield estimators of f(0), which are then used in (7.8). One of the simplest 
nonparametric methods yielding the “Kelker Index” is to suppose on 
biological grounds that g(x)= 1 for 0< x < w and use (7.6) with W replaced 
by w and n the number of distances measured that are less than w. A 
“jacknifed” form is to use 


A=(3n,—n,)/(4Lw) 


where n, and n, are the numbers of distances in (0,w) and (w,2w) 
respectively (Eberhardt, 1978). Anderson and Pospahala (1970) suggested 
a local quadratic form for f; fit 


f(x) = by + b,x? 


and use fO)=bp. 
Burnham and Anderson (1976), whose general approach we have fol- 
lowed in deriving (7.8), suggested 


A i 
fl0)= max {| Xay XQ S71 LXiyy 


as robust to animals moving away from the transect before detection. 
Their main intention seems to be that modern probability density estima- 
tion techniques should be used to give a nonparametric estimate of f but 
these do not seem yet to have been tested. 

So far we have assumed that right-angle distances are available. For 
radial distances d we consider g(x, d) as the probability of detection at 
right angle distance x and radial distance d. Then 


S(X)=E{f(x|D)},— 1/n=f(0) 
where f(x|d) is the conditional pdf of x given d, and we can estimate f(0) 
by the obvious estimate, suggested by Burnham and Anderson (1976), 


fo)=, DIO14,) (7.9) 


if the form of f(x|d) is known or can be estimated. The estimator of 


LINE TRANSECTS 143 


Hayne (1949) assumes a circular flushing area for each animal, so 
f(x|d)=1/d x<d, 0 x>d 


and 


i-(5 1/4.) /2L =n/2Lh 
1 


where A is the harmonic mean of the d;. Gates (1969) assumed g(x, d)= 
e~%?, 0<x<d, which would again give Hayne’s estimator (since each 
animal has a circular flushing area of exponentially distributed radius). 
However, maximum likelihood estimation and some bias correction led 
Gates to 


A\=(2n—1)/2Ld 


Thus the “obvious” form (7.9) need not be the best. 

The comparative studies available for this wealth of estimators are not 
satisfactory. Most are not comprehensive and have concentrated on a 
narrow class of mathematical models. Robinette et al. (1954, 1956, 1974) 
have constructed field trials in which Anderson and Pospahala’s and 
Kelker’s nonparametric methods were preferred to parametric methods. 
It seems that until the recent introduction of the Weibull and gamma tail 
areas, parametric methods have been insufficiently flexible. 
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CHAPTER 8 


Mapped Point Patterns 


Maps of small objects that may be considered as points are common; we 
may think of towns, forts, and stores in geography and archeology, plants, 
and nests in ecology, galaxies in astronomy, and many other examples. 
Throughout this chapter we will assume that our raw data are a map (or a 
list of coordinates) giving the locations of all N objects within a region D 
of known area A. Typically N will be less than 100. Thus it is essential 
to use all the available information efficiently. There are examples in 
which it is difficult to define the study region; Brown (1975) and Newton 
et al. (1977) discuss the locations of birds’ nests that can only occur in a 
particular habitat within the region (this habitat had not been accurately 
mapped). 

The assumptions of homogeneity and isotropy have only been in the 
background in Chapters 6 and 7, where we estimated the intensity aver- 
aged over D, or tested for departures from a homogeneous Poisson 
process. In this chapter we will always assume homogeneity and usually 
isotropy. For the continuously distributed observations of Chapters 4 and 
5 we could achieve stationarity by transformations or by removing a 
deterministic trend. For point processes our sole recourse is to analyze 
small areas intensively that we assume to be homogeneous when looking 
for interactions between objects, and to use large scale analyses such as 
those in Section 6.5 to detect intensity variations within our study region. 

There is a fundamental problem here. We shall see that there are 
mathematical models for the ideas of a Poisson process with intensity 
varying from place to place, and of cluster processes in which daughter 
objects are distributed about parent objects, which have identical distribu- 
tions and so cannot be distinguished by any amount of data. The 
observer must bear in mind that what is considered to be a clustered 
pattern with the assumption of homogeneity in force could also be the 
result of heterogeneity. 

Figure 8.1 illustrates a broad classification of point patterns. All are 
computer simulations of the processes detailed in the caption. Figure 8.1a 
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Fig. 8.1 Simulations of 100 points from (a) binomial process (5) cluster process with 20 
parents and cluster radius 0.05 (c) cluster process with 10 parents and cluster radius 0.25 (d) 
heterogeneous binomial process (e) randomly packed centers of disks of diameter 0.06 (/) 
randomly packed centers of ellipses with horizontal major axis 0.06, vertical minor axis 0.04. 
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is a realization of a Poisson process. Remember from Section 2.3 that 
conditionally on the number of points they are independent and uniformly 
distributed. Most people feel that a “random” pattern should not be so 
chaotic with such close pairs of points and large empty spaces. Both 
Figure 8.16 and 8.1c are from cluster processes with (on average) 20 and 10 
clusters, respectively. In the latter, the clusters are so large that we would 
probably regard the pattern as heterogeneous. Figure 8.ld is a more 
clear-cut case of heterogeneity with a clear vertical trend in intensity in a 
Poisson process. Finally, Figure 8.le, f represents realizations of processes 
with inhibitions between the objects, which we will call regular patterns. 
The difference between them is that Figure 8.1 f has a subtle anisotropy in 
the interactions and so has different properties in the horizontal and 
vertical directions (try turning the page 90°). The human eye (plus brain) 
iS quite adept at classifying point patterns but is easily fooled. (Anyone 
who does not believe this should look at the examples on pp. 5-8 and p. 32 
of Hodder and Orton, 1976.) 

There are several distinct approaches to the analysis of mapped point 
patterns. A traditional view has been to sample by quadrats, represented, 
for example, by Rogers (1974), covered in Chapter 6. Perhaps the com- 
monest approach is to use tests of “randomness” based on nearest-neighbor 
distances such as the test of Section 7.1. These are often misapplied, for 
to measure the distance from every object to its nearest-neighbor violates 
the sampling assumptions made there. In Section 8.2 we shall discuss 
what can be salvaged for mapped point patterns. 

To measure all nearest-neighbor distances we will have to measure all 
small distances between pairs of objects, so it seems sensible to use all this 
information. This leads us to methods based on the distribution of the 
distances between pairs of objects. Somewhat surprisingly, this is closely 
related to the second-moment properties of the counts N( A) of the number 
of points in a set A within D. We can always find M(A) by adding the 
contributions from many subsets of A of small area. For small enough A 
and B, the noncentered covariance 


E(N(A)N(B))P(N(A)>0, N(B)>0) 
avg(x,y) Xarea( A) Xarea( B) (8.1) 


for points xEA, yEB. For a homogeneous, isotropic process g(x, y) must 
be a function of the distance d(x,y), and can be thought of as a density of 
the interpoint distance distribution. This approach is taken further in 
Section 8.3. An alternative view of the second moments is to estimate the 
coefficient of variation var( N(A))/E(N(A)) for a square A. This is akin 
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to the David—Moore index we discussed for quadrat counts. With a 
mapped pattern it is possible to average analytically over all positions of 
the quadrat A and so to reduce the sampling variability. 

Measurements on the Dirichlet cells and Delaunay triangles (see Section 
4.3) defined by the pattern have been advocated by a number of authors 
[Boots (1974, 1975), Getis and Boots (1978), Mardia et al. (1978), Mollison 
and the reply in Ripley (1977), Vincent et al. (1976a) and Smalley (1966)]. 
Miles (1970) summarizes the known moments results for a Poisson process. 
The suggestions include the variance of the area of the Dirichlet cells and 
angle measurements on Delaunay triangles. There are no published com- 
parative studies, but unpublished work by P. J. Green (abstract: Advances 
in Applied Probability 10, p. 493, 1978) showed these ideas to be good, but 
not the best available. 

With mapped data we can do much more than just test for “randomness.” 
In Section 8.4 we consider mathematical models that formalize the ideas of 
clustering, inhibition, and heterogeneity, which have been in the back- 
ground of Chapters 6 and 7. The estimation of parameters of these 
models is a largely unsolved problem. Some general ideas are presented 
in Section 8.4 and applied in Section 8.6 to several examples of complete 
analyses of point patterns. 


8.1 BASIC PARAMETERS 


Suppose we have observed N objects x,,...,x,y within a “reasonably 
compact” domain D. For definiteness we will work in the plane unless 
otherwise stated. Several mathematical descriptions of these observations 
are possible and these suggest various analyses. We couid use the coordi- 
nates of the object, but any analysis would have to ignore the order in 
which the objects are labelled and respect the supposed invariance of the 
process under rigid motions of the plane given by homogeneity and 
isotropy. Multidimensional scaling considers the reduction of informa- 
tion to the set {d,;} of distances between the objects; from this it is 
possible to recover the locations of the objects up to a rigid motion 
(Torgerson, 1958). We could further discard the labels that tell us which 
distance is associated with which pair of objects. This reduces our infor- 
mation to the interpoint distance distribution, which as we saw in the 
chapter introduction has been used as a summary statistic (Bartlett, 1964). 
A further way to condense {d;;} is to consider {d; =min, d; ;}, the set of 
distances from each object to its nearest neighbor. This approach is taken 
up in Section 8.2. 
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The most theoretically useful approach to point processes has been via 
the description by NM(A), for all bounded Borel sets A. The reader is 
referred to the survey papers of Daley and Vere-Jones (1972) and Fisher 
(1972) and to the advanced monographs of Matthes et al. (1978) and 
Kallenberg (1976) for the full power of this approach. Carter and Prenter 
(1972) and Ripley (1976b) show the formal equivalence of all the ap- 
proaches sketched here. 

From the counts the natural parameters to consider are the moments. 
By homogeneity E( N(A))=Av(A), where A is a constant, the intensity, and 
y(A) the area (in R?) or volume (in R*) of the set A. The reduction of the 
second moments by homogeneity and isotropy is more difficult; less 
mathematical readers may wish to skip the rest of this paragraph. The 
noncentered covariances E(N(.A)N(B)) can be reduced to a nonnegative 
increasing function K on (0,00) by 


E(N(A)N(B))=Ar(ANB)+® [7,4 xB)dK(t) (8.2) 
0 
where 

v{(AXB)= falty-xives, d(x,y)=1} | dv(x) 


and o, is the uniform probability on the surface of the sphere of radius ¢ 
centered at the origin (Ripley, 1976a). The following remarks may help 
give the intuitive content of (8.2). The first term comes from objects 
counted in both A and B, the second from distinct pairs of points xEA 
and yEB. By the assumed homogeneity and isotropy the only relevant 
characteristics of (x,y) are those invariant under rigid motions, hence 
functions of the distance d(x,y). The measure », formalizes the idea of 
one object uniformly distributed in the space with the second uniformly 
distributed distance ¢ from the first. 
Certain special cases give intuitive interpretations of K: 


1. ?K(t) is the expected number of (ordered) pairs of distinct points 
not more than distance ¢ apart with the first point in a set of unit area. 

2. XK(t) is the expected number of further points within ¢ of an 
arbitrary point of the process (see Ripley, 1977, p. 190). 

3. From (8.1) g(d(x,y)) is given by g(t) =(A? dK/dt)/c(t), where c(t) is 
the length or area of the surface of a ball of radius ¢. 


For a Poisson process we find K(t) is the area or volume of a ball of radius 
t (nt? or 47t?/3) as we might expect from 1, 2, or 3. The power of A in 
the definition of K(1) was chosen to give this result. 
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The higher moments of {N(A)} reduce in a similar way, but to unwieldy 
functions. There is some evidence (Julesz, 1975) that the human eye is 
most aware of the second order properties of planar point patterns. 

Yet another description of a point pattern is obtained by retaining only 
whether A contains an object, that is replacing M(A) by min (N(A), )= 
I(N(A)>0). This point of view will be extended to random sets in 
Section 9.1. We do not need to consider all Borel sets A; the distribution 
of the stochastic process {N(A)} is specified by giving 


P(N(A,)=0,..., N(A,) =0) 


for all finite collections (A,,..., A,) of balls (Ripley, 1976b). The “first- 
order” part of this description would be 


p(t)=P(N(ball center x, radius 1) >0) (8.3) 


which by homogeneity does not depend on x. However, p(t) is the 
cumulative distribution function of the distance from x to the nearest 
object. Cowie (1967), Roder (1975), and Diggle (1979) have suggested 
considering the analogous distribution for object—nearest-object distances. 

Gaussian processes are characterized by their mean and covariance 
functions. No combination of the parameters presented in this section 
characterizes an interesting class of point processes, yet in suitable combi- 
nations they can provide insights into which mechanisms could and could 
not have generated an observed point pattern, as we shall see in Section 
8.6. 


Edge Effects 


Interpretation (1) of K shows that it is closely related to the distribution F 
of the distances of pairs of points within D. The latter depends strongly 
on the shape and size of D. For a (homogeneous) Poisson process it is the 
distribution of the distance between a pair of independent uniformly 
distributed points in D, formulas for which were given in specific cases by 
Borel (1925) and for domains with smooth boundaries (an unstated condi- 
tion) by Geciauskas (1977). 

The effect of the edge of the domain D becomes increasingly dominant 
as the dimension increases. A number of special edge corrections are 
considered for individual methods. In addition, there are two general 
approaches: 


1. Consider D within some larger domain D* and allow measurements 
from objects in D to objects in D*. Usually this can be achieved only by 
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Fig. 8.2 An interpretation of toroidal edge correction. Distances may be measured from a 
point in the central rectangle to points in the surrounding rectangles. (From Ripley, 1979b.) 


defining a new study region inside the map as given. The effective sample 
size is then the number of points in D*. This method is usually referred 
to as allowing a border or guard area. 

2. Rectangular regions D can be regarded as a torus, so that points on 
opposite edges are considered to be close. An alternative interpretation 
illustrated in Figure 8.2 is to regard D as part of a grid of identical 
rectangles, forming a border with copies of the pattern inside D. 


8.2 NEAREST-NEIGHBOR METHODS 


In Chapter 7 ae mentioned the Skellam—Moore test; d*=N(a=fd?)/A is 
referred to a X@ yy distribution to test the null hypothesis of a Poisson 
process. This is, however, a misuse of the test, for the derivation of the x? 
distribution assumed that the squared distances d? and d? are indepen- 
dent. However, there is a nonnegligible probability that the areas searched 
for the nearest neighbors will overlap; Donnelly (1978b) has shown by 
numerical integration that the correlation between d? and d; 2 is enough ee 
reduce the variance of d* by about 16% from that predicted by the x? 
distribution (in two dimensions, using toroidal edge correction, for all V). 
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Brown and Rothery (1978) consider this statistic and two others intro- 
duced by Brown (1975)—the coefficient of variation S and the ratio G of 
the geometric mean to the arithmetic mean of {d?}. Brown needed a test 
for regularity of birds’ nests which did not depend on the area A of D, 
which was only vaguely specified. Both S and G have this property. 

Dependence is only one of the problems in applying these tests to maps. 
We also have to counter edge effects. Since we interpret 7d? as the “area 
searched” from the ith object we could substitute the actual area searched 
within D. Even with this edge correction, no adequate approximation to 
the sampling distribution of d*, S, or G is known, so only Monte Carlo 
tests (Section 2.5) are available. 

Clark and Evans (1954) advocated the average of the distances (d,) 
rather than that of their squares. Let d be this average. Then they 
introduced R=d/E(d;) as a “measure of nonrandomness,” large for 
regular patterns and small for clustered or heterogeneous data, and as a 
test statistic, 

CE= [4-2(4)] (8.4) 
V/var(d ) 


to be referred to the standard Normal distribution (although because of 
dependence this does not follow from the usual Central Limit Theorem). 
The expectation and variance are for a binomial process with N points. 
Ignoring both edge effects and correlations between d; and d,, we find 


E(d,)~0.5v/(A/N) 
var(d )~(4—71)A/4aN? ~0.0683A4 /N? 


from the moments of an exponential distribution, substituting A=N /A for 
A. Diggle (1976) and Donnelly (1978a) investigated the correlation be- 
tween d; and d; and found it to be small. Clark and Evans recommended 
a border to combat edge effects but many users of their test have since 
ignored that advice. The problem has been investigated by Matérn (1972), 
Persson (1972), De Vos (1973), Hsu and Mason (1974), Vincent et al. 
(1976b), and Donnelly (1978a, b), with various remedies. Donnelly showed 
that the distribution of dis Normal for all practical purposes for N greater 
than six, so all that is needed are approximations to its mean and variance. 
He found, by a mixture of numerical integration and simulation, 


E(d;)~0.5\V/(A/N) + (0.514+0.412/./N)P/N 
var(d )~0.0704/N? +0.037P\/A/N*/? (8.5) 
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for a rectangle D of area A and perimeter length P. (If toroidal edge 
correction is used, of course P=0.) Note that the effect of ignoring these 
edge corrections is to bias the test, so that regularity will be detected 
spuriously. The use of mean distances rather than mean squared dis- 
tances has been criticized as inefficient, for squared distances give the 
maximum likelihood estimator of A for a Poisson process. For a test this 
is irrelevant; d* and d can only be compared in simulation studies such as 
those reported in Section 8.5. 

At (8.3) we define p(t) as the cumulative distribution function of the 
distance from a point to the nearest object. The obvious way to estimate 
this for a homogeneous pattern is to find the empirical distribution 
function for distances from m sample points to their nearest objects. 
However, edge effects intervene, so for each sample point we measure the 
distance d, to the nearest object and r, to the boundary and compute 


B(t)= # {ild; <t<r,}/# {ilt<7,} 


Thus for each ¢ we only consider the subsample of points at least ¢ from 
the boundary (reply to Ripley, 1977). How should these m points be 
chosen? From the point of view of Chapter 3 we are sampling to find the 
average of a random function defined on D, the distance from a point to 
its nearest object. We would expect large positive local correlations, so 
the theory suggests systematic or stratified random sampling. Since a 
general-purpose program might meet a completely regular pattern it is 
probably best to use stratified random sampling. 

Baddeley (1980) has shown weak convergence of f to p at rate 1/\/N for 
a class of point processes but unfortunately the limit process is not a 
standard process even for a Poisson point process. Since the sampling 
variability will be proportional to |/\/m this suggests we take m propor- 
tional to N, at least for small N. For large N we will probably decide on a 
particular accuracy for p and choose m accordingly, irrespective of N. 
Figure 8.3 shows the results of a simulation experiment. First, 10 realiza- 
tions of a binomial process of 50 points were sampled by a 50X50 
stratified random sample; then, a single realization was sampled 10 times 
each by 1010, 3030, and 5050 samples. The results suggest that m 
in the range 10N to 20N with a maximum of m=1000 is sufficient. 
(Ripley, 1979c, recommended 10N, whereas Diggle, 1979, used 10x 10 for 
N=100, which seems insufficient. cf. Diggle and Matérn, 1980.) 

Bartlett (1974, 1975) looked at histograms of the distribution of {d,} 
object—nearest-object distances rather than point—nearest-object distances. 
He computed the Eberhardt (1967) index E=NSd?/(Zd,)’, the sampling 
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theory of which seems never to have been investigated. Diggle (1979) 
considered the empirical cumulative distribution function @ of {d,} (previ- 
ously suggested by Cowie, 1967 and Roder, 1975) and defined the test 
statistics 


d, =sup| p(t)—p(t)| 
d, =sup|q(t)—p(t)| 
d, =sup| p(t) — 9(2)| 


where p(t)=1—exp(—7N?7) is the approximate mean of p(t) and 4(t) for 
a binomial process of N points. The use of a Kolmogorov-type test is not 
particularly appropriate, as the standard error of p(t) will vary with ¢ as 
illustrated in Figure 8.3. Only Monte Carlo tests have been used, al- 
though Baddeley’s asymptotic theory suggests that \/Nd,, will have a limit 
distribution and a table of percentage points could be created. Diggle’s 
experiments suggest that d, is effective against clustering (or heterogeneity, 
for which p was originally introduced) and d, against regularity, d, being a 
compromise between the two. 

Note that f(t) could be computed analytically within each Dirichlet cell 
(for which the defining object is the nearest) and the results for each cell 
averaged, weighted by the cell area. The computation involved is heavy, 
but knowledge of the Dirichlet cell contiguities can be used to speed up the 
search for nearest neighbors. Suppose that the stratified sampling grid of 
points is scanned as in Figure 8.4. Then the nearest object is usually 
either the nearest object for the previous point or the object defining one 
of its contiguous cells. Further, this can be confirmed by checking if the 
point falls within one of these cells. Finding the contiguity information is 
an appreciable overhead, however, as the following timings show, for 


Fig. 8.4 Order of scanning of a grid for p. 
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N=50, m=500: 
Naive evaluation 2.4 sec 
Contiguity information 0.32 sec values for CDC 6400 
Computing distances 0.46 sec 


A nearly systematic sampling scheme brings computational as well as 
statistical benefits! 
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83 SECOND MOMENTS 


We have seen (in Section 8.1) the connection between second moments of 
counts and the distribution of the distances between pairs of objects. 
Suppose first that we have a guard area, and let F be the empirical 
cumulative distribution function of distances from points in D to distinct 
points in D or in the guard area. The size of the border must be sufficient 
to include all points not more than 9 away from D. The sample size is 
unclear, so suppose for the moment F has jumps of size 1. Then, from 
(8.2), 


E( F(t)) = E(number of pairs, one in D, second within ¢ of first) 
=ANK(t)  fort<ty (8.6) 


Clearly, we should rescale by A~?, but A is unknown. An unbiased 
estimator of A is 


\=N/A 
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whereas for a Poisson process M(N — 1)/A? is an unbiased estimator of )’. 
It is usual to consider 


F(t)=(S1)/N(N-1) (8.7) 


the sum being over all ordered pairs of distinct points in D not more than ¢ 
apart (Bartlett, 1964). Since we no longer allow a guard area, we expect a 
downward bias unless toroidal edge correction is applied, when (8.6) holds 
up to one-half the shorter side of the rectangle. The exact expectation for 
a binomial process (and for a Poisson process, because it does not depend 
on N) is known; see Kendall and Moran (1963, pp. 41-42) and 
Geciauskas (1977). However, it is possible to produce an unbiased esti- 
mator by weighting pairs of objects. We define (for N > 2) 


R(t)=A(S k(x,y))/N? (8.8) 


the sum being over the same pairs (x, y) as (8.7), but 1/k(x,y) is defined as 
the proportion within D of the circumference of the ball centered on x with 
boundary passing through y. Figure 8.5 explains the intuitive idea; since 
some objects at distance d(x,y) from x might be outside D and so be 
missed, we weight the observed object at y inversely by the probability that 
such an object would be observed. Of course we must consider both (x, y) 
and (y,x). We find (Ripley, 1976a) 


E(N7K(t))=[E(N)] K(t)=(AAYK(2) (8.9) 


for distances ¢ less than the circumradius of D. K (t) should not be highly 
correlated with N, so K(t) will be an approximately unbiased estimator of 
K(t). Although the derivation of (8.9) via (8.2) assumed homogeneity and 
isotropy, we can find the mean of K(t) for a binomial process with N 
points. It is 7f7(1—1/N) and is also the conditional mean E(K(t)|N(D) 
=N) for a Poisson process. The factor (1—1/N) is related to the choice 
of divisor N* or N(N—1); it seems immaterial in practice. From the 
interpretations a K(t) given in Section 8.1 we would expect K(t) to be 
smaller than wt? (its approximate expectation for a Poisson process) if the 
data form an inhibited or regular pattern, and to be larger in the presence 
of clustering or heterogeneity. Furthermore, the sizes of the deviations 
from wt? and the distances at which they occur should give some further 
insights into the pattern. Besag, in the discussion of Ripley (1977), 
suggested the use of a square-root scale to linearize the plot of R(t) vs. t 
for a Poisson process. This also has the effect of stabilizing the variances 
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for we shall see that N?Ki (t)/A has an approximately Poisson distribution. 
Note that this is a fortuitous coincidence, for whereas square roots always 
stabilize variances the linearizing transformation is the dth root in R?. 
Define 


L(t)=V(K(t)/7) 
Ly, = sup|L(t)—+| 


t<lo 


Then small values of L,, provide an intuitively reasonable test region for 
the null hypothesis of a Poisson process, for some maximum distance of 
interest ¢). Its distribution is approximated later in this section. 

At this point the reader may wish to look ahead to Section 8.6 to see 
some examples of the use of K and Lian 

Ripley and Silverman (1978) introduced a family of statistics related to 
both those in the last section and in this. Let dy)<dg)<-:- be the 
distances {d;;} in increasing order. Then d,,,, =d() =sup{t|K(t)=0)} is 
the smallest nearest-neighbor distance, and L,,>d,,,, with probable 
equality when the pattern is strongly regular. For a binomial or Poisson 
process 


N(N~1)d2,/A 


has a xd distribution for practical purposes for N> 15 (i=1) or N> 30 
(i< 9). In particular, d?;, has an exponential distribution of mean 
2A/aN(N~—1). These statistics are vulnerable to recording errors and 
should only be used when the coordinates are known better than to within 
one-third of din. 

Both Jolivet (1978, 1980) and Liebetrau (1977, 1978, with Rothman, 
1977) consider estimators of E( N(C)*) for a rectangle C of sides c, and c, 
and of area a. Liebetrau’s approach is to consider C on a torus D and to 


k(x, y) = ratio of ayb to 
whole circumference Fig. 8.5 Interpretation of k(x, »). 
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average [N(C+x)—a/A]? over all translations C+x of C. The average is 

V(C)=a>, (x, y)/A +Na/A—N?a?/A? (8.10) 
where the sum is over all ordered pairs (x, y) of distinct points and 


$((x,,%2),(1, Y2))= (C1 — [x1 — Vil) 4 (€2 — 1X2 “Pal 4 /ere2 
(t), =max(7,0) 


For a Poisson process V*=V(C)/(1—a/A) is an unbiased estimator of 
var(N(C))=Aa. A test of a Poisson process can thus be based on the 
ratio 


L*=(V*/\a)—1 (8.11) 


an analogue of David and Moore’s index of Section 6.1 computed from all 
possible quadrat positions rather than from a finite sample. 

Jolivet (1978) defined @(C)=E(N(C)*)/E(N(C)) as a “criterion for 
aggregation” and estimated this by 


j= > 2) o(%y) 


XA Se (8.12) 


where the sum is over ordered pairs of distinct points (x,y) with x ED, but 
where y is allowed in a guard area. The second term arises because 
Jolivet counted the pairs (x,x). J is unbiased if A is assumed known. 


Distribution Theory 


Note that F, K , V, and J can all be written in the form 


T=a>dV(x,y) +b 


where xD and either yED or yED* DD if we allow a border. Thus we 
can treat their asymptotic behavior in a unified way. Many asymptotic 
possibilities are open, for we can vary N (or A), D, and C or the scale of 
the distances separately or together. Perhaps the most natural case, 
considered by Liebetrau (1978) and Jolivet (1978, 1980) is to allow D to 
expand and to include more and more of a realization of a point process 
throughout the plane. Many authors have considered the asymptotic 
behavior of F(t) for a binomial or Poisson process: Hafner (1972), Kester 
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(1975), Silverman (1976, 1978), Saunders and Funk (1977), Silverman and 
Brown (1978), and Brown and Silverman (1979). 
Consider a binomial process with N objects and no border, in a square 


of sided. Let x(x,y)=4{¥(x, y)+¥(y,x)} so 


T=2a > x(X;, X;)+6 (8.13) 


i<j 


where (X;) are independent uniformly distributed random variables on D. 
Equation (8.13) exhibits T as a U-statistic (Lehmann, 1975, Appendix 5). 
However, we will allow x and the distribution of X, to change with N, 
making the standard asymptotic theory for U-statistics inappropriate. We 
find 


E(T)=aN(N-1)E(x(X,, X)) +0 
var(T) =4a?N(N-1)[ (N—2)var{ E(x(X,, X2)|X,)} 
+ i var(x(X;, X2)} | (8.14) 
The first term in (8.14) is due entirely to edge effects, being zero when 


toroidal edge correction is used. 
Consider first F and K. For small t/d we have (Ripley, 1979b) 


E(x(X,, X2))a(t/d)? 
var(x(X, X2))©E(x(%X, X2)) 
var| E(x(X,, X2)|X,)|~4e(t/d)> x =0.672 for F 
0.061 for K 


U= 2.;x(%;, X,) is a count for F, and an approximation to a count for K, 
with mean and variance 


E(U)=(Nt/d) 2/2 
var(U)=(Nt/d)?2/2+4x(Nt/d)?/N? (8.15) 


Clearly, Nt/d is a crucial parameter. If it is small, we have a count of 
rare events and will consider a Poisson approximation provided the second 
variance term is negligible compared with the first. Thus, for 25 points, a 
Poisson approximation is suitable for t/d< 0.25 for K, 0.13 for F , and for 
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N> 50, for t<6d/N for F or K. Beyond these limits we must use a 
Normal approximation. Note that in the range of ¢ for which the first 
term of (8.15) dominates the second var( F ) and var(K ) are proportional 
toN~?. The terms become equal at t=0.84d/N'/? for F, t=1.86d/N'/? 
for K. In particular, for 1> 0.6d/N'/° , K(t) has an appreciably smaller 
variance than F. This value is two to three times the mean nearest- 
neighbor distance for usual values of N. These Poisson and Normal 
approximations are supported by limit theorems, which do, however, give a 
warning; with toroidal edge correction, 1, D fixed and N-oo the limiting 
distribution of N( F(t) - F(t)) is non-Normal (Silverman, 1978; Gregory, 
1977). 

Not only can we regard N 2K(1) /2A_ as approximately Poisson for 
t<0.9d/N'?; we can approximate N?K/2A by a Poisson process of 
intensity N(N—1)2at/A in this range. This result gives the distribution 
of d,,, and can also be used to find percentage points for L,, by simulating 
this Poisson process. 

We can apply a similar analysis to 


I=m2 > $(X;, X;) 


i</ 


except that since this is not a count, a Poisson limit is inappropriate. We 
have, for a binomial process with N points and e=area (C)/area (D)= 
a/A«l, 


E(1)=N(N-1e 
var(I)=N?(8e/9 —2e*) + N3(3.2e7°) 


for var[ E{@(X,, X>)|X,}]~0.8e7>. 
We can safely ignore the edge effect term if e< N ~?/?/2, and of course 
if toroidal or guard area edge correction is used. In that case 


E(V*)=Ne — var(V*)=8e°N?/9 
E(L*)=0 _ var(L*)~8e/9 

E(J)=[N+N(N-l)e]/AA 
var( J )8N7e/9(X A)? 


Both Liebetrau and Jolivet considered Poisson processes, so we take 
expectations over N. Note that the mean and variance formulas for L* 
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are independent of N. For J we have 
E(J)=1+Aa 
var( J)=(4+8/9)e+4Aae+1/AA 


so the variance for a Poisson process is at least five times that for a 
binomial process with the same intensity! Clearly, the conditional vari- 
ance should be used. 


8.4 MODELS 


Chapter 6 introduced two informal models for clustering and for hetero- 
geneity and pointed out the tenuous nature of the distinction between 
them. We can now formalize these processes and show that certain 
distributions of point processes can arise from either mechanism. 


Cluster Processes 


A Poisson cluster process is defined by taking a Poisson process of 
intensity a of parent points and centering on each parent an independent 
daughter process of objects. The observed process may be either parents 
plus daughters or just all daughter objects. We will assume the latter. 
Another way to view this mechanism is to have an infinite set of indepen- 
dent identically distributed processes, to use a Poisson process to select a 
translation for each, and then to add up the objects in all the translated 
processes. This suggests a modification in which we choose a Poisson 
process of rigid motions, thereby giving each daughter process an indepen- 
dent uniformly distributed rotation. We will assume that each daughter 
process contains a finite number of objects. The cluster process will 
always be homogeneous, but will only be isotropic if the daughter process 
is isotropic or if the daughters are given an additional rotation. The most 
useful subclass of Poisson cluster processes is Neyman—Scott processes, for 
which each daughter object is independently distributed around the parent. 
Then if n is the (random) number of objects in the daughter process, 


K(t)=at? +aE(n(n—1))f(t)/¥ (8.16) 


p(1)=1~exp{ ~a f[1-£((x.1)")] dx} (8.17) 
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where all expectations are over n, f is the cumulative distribution function 
of the distance between two daughters with the same parent, and g(x, f) is 
the probability that a daughter point does not fall within distance ¢ of x. 

Equation (8.16) can be derived from interpretation (1) of K(t). The 
pairs of points can come either from different clusters giving the first term 
by the independence of clusters, or from the same cluster. If there are n 
objects in that cluster, the expected number of pairs not more than ¢ apart 
is n(n— 1) f(t), from which is derived the second term. Note that K(t)— 
mt? is an increasing function and that we can infer an estimate of f(r), and 
hence the cluster size, from K(f). Formula (8.17) is a special case of 
(9.10). Consider the N parent points within a bounded set D. For large 
D of area A 


1—p(t)~P(no daughter with parent in D is within ¢ of the origin) 
= Se-*4(aA)”{ P(no daughter within ¢|parent in D)}“/N! 


=exp ~aA{1—P(no daughter within t|parent in D)} 
exp ad{ 1— f £(s(x,1)")4x/4] 
D 


using the independence of the daughters and the uniform distribution of 
the parent in D. Letting D increase gives (8.17). 

Of course the parent process need not be Poisson; it could itself be a 
cluster process, giving rise to processes with a (finite) hierarchy of clusters. 
Another possibility is to take a regular process of parents to avoid the 
overlap of clusters. 


Doubly Stochastic Poisson Processes 


We can define a homogeneous process by taking a heterogeneous Poisson 
process with mean measure A, where A is a realization of a stochastic 
process, stationary under translations. If A is isotropic, then so is the 
resulting point process, called Cox or doubly stochastic Poisson. As an 
example, suppose we take the Neyman-Scott cluster process with a Poisson 
number mean £ of daughters uniformly distributed within a disk of radius 
r centered on the parent. Then if we condition on the parent Poisson 
process, we have a Poisson process of intensity A(x), 8 times the number of 
discs which cover x. Thus this is a Cox process, one of several defined by 
Matérn (1971). Clearly, this duality holds whenever the number of 
daughters in a Neyman-—Scott cluster process has a Poisson distribution. 
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It seems unknown just how generally Poisson cluster processes are also 
Cox processes. (Bartlett, 1963, gives a counterexample in one dimension.) 


Regular Processes 


We introduced in Section 2.4 a general way to build new point processes 
from the Poisson process. The most useful are those based on pair 
potentials, with 


(x) =ab*™ |] h(d(x,, x,)) (8.18) 


i<j 


and the probability of any event E is 
PE)= { (x) dPo(x) 
E 


where P, is the distribution of a Poisson process of intensity A. As noted 
in Section 2.5, such processes can only be defined directly on a bounded 
set D. We can and usually will sidestep problems at the edges by making 
D a torus, opened up to be a rectangle. The constant a is chosen to make 
P,(2)=1, so a is almost always an unknown function of b, A, and the 
parameters defining h. 

Kelly and Ripley (1976), following Strauss (1975a), introduced Strauss 
processes with h(d)=c, d<r, h(d)=1,d>r so that 


(x) = ab Mei (8.19) 


where ¢(x) is the number of pairs of points in x less than r apart. The 
necessary and sufficient condition for a to be able to be chosen finite and 
nonzero is0<c< 1. This family is characterized by the property that the 
probability density of an object being found at a point x, conditional on all 
the other objects, depends only on the number of objects within r of x. 
An important special case is c=0, which corresponds to keeping only those 
realizations of a Poisson process with no object pairs less than r apart, and 
so is a model for the centers of nonoverlapping disks of radius (7/2). 
Strauss processes have a subtly different sequential packing version. 
The first object is placed uniformly in D. Subsequent objects are gener- 
ated uniformly in D, and accepted with probability c’, where s is the 
number of existing objects closer than r to the possible new object. This 
process is repeated until a fixed number of objects are tried, or a given 
number are accepted. For c=0 the latter is the SSJ process of Smalley 
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(1966) and Diggle et al. (1976). Other, related, processes for disk centers 
are described by Matérn (1960), Paloheimo (1971), Bartlett (1974), and 
Ripley (1977). 

Ord (in the discussion of Ripley, 1977) proposed a modification in which 
the areas of the Dirichlet cells were used. This seemed appropriate for 
towns and possibly trees when a reserved food area is needed. This 
model would have a density of the form 


o(x)=ab*™ |] g(area of Dirichlet cell of the ith point) (8.20) 


Almost no analytical work is possible with the models defined by (8.18) 
to (8.20) and we have to resort to simulation. In principle, a realization of 
the process can be obtained by sampling realizations of a Poisson process 
and accepting each realization independently with probability o(x)/M, 
where M is an upper bound for (x), if such exists. In practice, almost 
every realization is rejected, making this process far too slow. An alterna- 
tive simulation technique has been proposed by Ripley (1977, 1979a). To 
generate a sample in D with N objects take any pattern with N points and 
(x) >0 and operate the following process repeatedly. Choose one object 
at random, delete it, keeping (x,,...,x,), say, and choose a replacement 
object x, from the conditional density 


A$(xy,..., Xn )/P( XQ 5-065 Xpq) 
= ABTA d(x,,.x,)) for (8.18) 
2 


Again the normalizing constant A is chosen to make this a probability 
density. We probably still do not know A, but the rejection procedure 
suggested as impracticable for all N points may be possible for x, condi- 
tional on (x,,...,x,). This process ultimately yields (dependent) samples 
from the distribution P,. With a sensible choice of initial pattern practi- 
cally independent samples can be taken every 2N steps. A FORTRAN 
algorithm for the Strauss process and proofs are given in (Ripley, 1979a). 


Estimation 


Estimating the parameters for these processes is a difficult problem. 
Little direct progress has been made towards its solution. For cluster 
processes maximum likelihood involves looking at all possible allocations 
of objects to clusters. For pair-potential processes the likelihood contains 
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an unknown normalizing constant which is a function of the parameters, 
so progress by classical means is immediately blocked. Besag (1978) 
estimated the parameter c of a Strauss process by taking a grid of counts 
and fitting an auto-Poisson process to this by the approximate method of 
pseudo-likelihood. The resulting estimates for the Spanish towns data of 
Ripley (1977) fit less well than do those obtained in that paper by trial and 
error. For this model with c=0 Ripley and Silverman (1978) showed d,,;, 
is a sufficient statistic for r and that a UMP test for r=0 (Poisson) against 
r>0 is based on d,,,,. Further, the maximum likelihood estimate of r is 
Gin and the asymptotic distribution theory for d,,, can be used to give 
confidence limits for r, for M(N—1)(d2;,—7r*) has approximately an 
exponential distribution of mean 24 /z. 

Diggle (1978, 1979, 1980) has formalized the trial and error procedure. 
A goodness-of-fit test will be based on some “distance” between a statistic 
T for the data and its expectation 7° for the model with parameter 0. 
Diggle’s idea amounts to choosing 8 to minimize this “distance” and then 
choosing some other statistic as a test of goodness of fit. If T® is not 
known it can be estimated by simulations of the model. Such a procedure 
is frequently used informally. A better check on the fit of the chosen 
model can be made by looking at more data. Frequently extra data are 
unavailable. One way out is to divide the original data into two or more 
subsamples and to estimate parameters from one sample, testing the fit 
against the rest. In practice commonsense will prevail. Often the prob- 
lem is to find a good enough fit with parameters estimated from the data! 


8.5 COMPARATIVE STUDIES 


Both Diggle (1979) and Ripley (1979b) have published power studies of 
selections of the tests described earlier for the null hypothesis of a binomial 
process against cluster, Strauss, and SSI alternatives with the same number 
of objects. Diggle performed 4% Monte Carlo tests with N=100 objects 
based on the same 99 binomial simulations. He considered the Clark- 
Evans test CE,d,,d,,d,, L,,(but with tg =0.25d, unreasonably large for 
N=100) and the variance/mean ratio q for a 5X5 grid of squares. 
Against SSI alternatives L,, was clearly dominant. The cluster alternative 
was a Poisson cluster process with one plus a Poisson number in each 
cluster, with mean ranging from two to four daughters per cluster. Here 
all of CE, L,,, d,, and q did well. Perry and Mead (1979) give some 
additional power studies for q. 

Ripley (1979b) considered CE, d*, S, G, L,, and I (equivalent to J and 
L* for a fixed sample size). Against the regular alternative of a Strauss 
process L,, (with fg /d=0.25 for N=25, 0.125 for N=100) was clearly 
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superior, with CE next and G effective against packed disk alternatives, 
that is, c=0. For cluster processes of the type described under “‘doubly 
stochastic Poisson processes,” CE, L,,, and J were equally effective, with J 
surprisingly effective at large numbers in large clusters that would nor- 
mally be classified as heterogeneity. For J the square C was taken as 
to Xtg; thus it corresponds to an analogue of q with all quadrat positions 
considered (and quadrats of about the same size as those used in Diggle’s 
study). 

We have seen how toroidal edge correction can simplify the distribution 
theory for methods based on nearest neighbors and on second moments. 
The Ripley (1979b) study was conducted both with specialized edge 
corrections (Donnelly, K ) and with toroidal edge correction. There was 
no significant difference in power against cluster alternatives; against 
regular alternatives the toroidal correction was usually less effective (giving 
lower power); this was most marked for L,,. 

These seem the only studies on which one can base recommendations. 
Tests based on CE, L,,, and J have the clear advantage that their 
distribution under the binomial process is known to a good approximation, 
at least for nearly square regions D. These are also the tests that did well 
in the power comparison. If hand calculation is necessary, only CE 
would be used. (However, it is surprisingly difficult to find all the 
nearest-neighbor distances correctly in a pattern of any size.) With a 
computer all three methods involve searching over all pairs of points and 
so take approximately the same time, proportional to N?. More efficient 
methods are available for large N. For nearest-neighbor distances we 
could use the Dirichlet cell contiguity information, since the nearest 
neighbor must be the defining object of a contiguous cell. Ultimately, this 
needs a time of order Nlog N. For L,, and I we will be interested in short 
distances, and by sorting the data objects we can screen out a large 
number of pairs. However, at least in my implementations, none of these 
complications is as fast as the naive methods for N < 100. 


8.6 EXAMPLES 
Other examples of analyses of point patterns by similar methods may be 
found in Ripley (1977, 1979c). The former are considered further by 
Besag (1978) and Diggle (1978, 1979, 1980). 
New Zealand Trees 


Figure 8.6 illustrates the analysis of a plot of trees reported by Mark and 
Esler (1970). These workers recorded the diameter of each tree and 
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Fig. 8.6 New Zealand trees—86 points in a 14085 feet rectangle. (6) Dirichlet cells. (c) L. 
(d) p for the data and the envelopes of p for 50 binomial simulations. 


related this to the “area potentially available” of Brown (1965), the area of 
the Dirichlet cell defined by the tree. We will be concerned solely with 
the pattern of the trees as small objects. There are 86 trees in a rectangle 
about 14085 feet. There is an obvious boundary line on the right-hand 
edge of the plot in 8.6a. These trees were discarded from the analysis. 
Figure 8.65 shows the Dirichlet cells and illustrates the number of close 
pairs of trees. The statistics L and p are shown in Figure 8.6c,d. The 
band on the plot of L is the 95% confidence band based on L,,; 95% of 
realizations of L calculated from a binomial process should lie within this 
band. JL for the data does! The minimum distance between a pair of 
trees is about 2 feet, near the upper 5% point for a binomial process, but 
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Fig. 8.6 (continued ) 
N.B. The y-axis scale of all L plots shows L//A. 
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(a) 


Fig. 8.7 Swedish pines. 71 trees in a 10-meter square. (b) Dirichlet celis. (c) L. (d) p for 
data and 100 binomial simulations. (e) and (f) L and p for data and 50 simulations of a 
Strauss process with c=0.2, r=70 cm. 


172 


(d) 
Fig. 8.7 = (continued) 173 
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unreliable since the data were digitized to the nearest foot. The p plot 
shows the upper and lower envelopes of # for 50 binomial simulations. 
Since a Monte Carlo test shows that at each distance ¢ the probability of 
the data giving an extreme value of f(r) is 2/51, and this never occurred, 
clearly this data set is consistent with a binomial process, and hence with a 
Poisson process. 


Swedish Pine Trees 


Not all trees have a random pattern. Our next example, Figure 8.7a, 
shows 71 pine saplings in a 10 x 10-meter square from Strand (1972). The 
Dirichlet cells in Figure 8.76 look much more regular than do those of the 
previous example, suggesting that the pattern is on the regular side of 
random. The plot of L in Figure 8.7c rejects the Poisson null hypothesis 
(in fact, L,, does so at 1%), and d,,;,, is 22 cm, at the upper 2% point, again 
suspect because of insufficient accuracy in the coordinates. The p analy- 
sis of Figure 8.7d again suggests that the trees are regularly spaced, there 
being too few empty circles with a 70-cm radius. 

Having rejected a Poisson process, we look for an alternative. The plot 
of L suggests we might choose a Strauss process with an inhibition distance 
r=70 cm, about the point of maximum deviation of L from the central 
straight line. The value of c was chosen by trial and error. Figure 8.7e, f 
compares L and / for the data and 50 simulations of the Strauss process 
with r=70 cm, c=0.2. Although the p plot suggests that the data may be 
more regular than these simulations, the fit is much more adequate than a 
Poisson process. The value of 70 cm makes physical sense for young 
trees. An alternative model would be to use Ord’s suggestions and to 
discriminate against small Dirichlet cell areas. The parameters of the 
Strauss process provide a convenient summary of the data, which may be 
compared with those from similar plots elsewhere. 


Magnetite Crystals 


Davis (1973, pp. 308-311) gives an example of 47 magnetite crystals in a 
vertical 80 100-cm slab of anorthosite rock. He detected regularity 
spuriously, using the uncorrected version of Clark-Evans’s test. (Don- 
nelly’s corrections reduced the value of CE from 2.59 to 1.48.) Figure 
8.8a shows an apparent increase in intensity from top to bottom. Whether 
this is significant will depend on the type of pattern, for we would expect 
large intensity variations in cluster processes, but almost no variation in a 
very regular process. Initially we will suppose that the underlying process 
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Fig. 8.8 Magnetite crystals. 47 in a 80 100 cm slab of rock. (b) L. (c) and (d), L and p 
for data and envelope of 100 binomial simulations. 
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Fig. 8.9 Top (a,c) and bottom (4, d) halves of magnetite crystal data. 


(d) 
Fig. 8.9 (continued ) 
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is homogeneous. Figure 8.8b,c shows plots of L. From the enlargement 
in Figure 8.8c we see that L,, rejects the Poisson null hypothesis at 5% 
solely because d,,,, is too large. (The envelope of simulations shown there 
shows that the confidence band is of the correct shape.) In fact, d,,;, iS 
3.2 cm, about the upper 1% point of its null distribution. The points were 
recorded to the nearest centimeter, so this should be reliable. The plot of 
Pp in Figure 8.8d does not reject the Poisson null hypothesis, but is 
consistent with slight inhibition. 

Figure 8.9 shows the upper and lower halves of the slab separately. 
These contain 18 and 29 crystals. From the previous analysis we will be 
justified in regarding these as independent Poisson variates. A chi-square 
test for the difference in their means has a significance level of about 10%. 
Many other trends in intensity could have been spotted, so we attribute the 
intensity variation to chance. The plots of L for the two halves are 
reassuringly similar. 

Apparently some inhibition between crystals would be expected since 
they compete for heat energy during the growth process. Our next step is 
to fit a hard disk process to model this. We chose the Strauss model with 
c=0. It remains to choose the disk diameter r. Section 8.4 gave the 
maximum likelihood estimator as d,,,; we chose r as the median of the 
exponential distribution for d2,,—r*, giving 7=2.9. Figure 8.10 com- 
pares L and p for the data and 100 simulations of this process. 

There are more sophisticated techniques available for detecting trends in 
a Poisson process. If we are looking for a trend in the y direction, the y 
coordinates of the points form a Poisson point process on the line to which 
we can apply the tests described in Cox and Lewis (1966, Chapter 3). 
Also, we saw in Chapter 6 that we can apply log-linear models to quadrat 
counts. Using 10 horizontal rectangles and a trend of the form e*%’, a is 
estimated as 1.06 per meter, and the likelihood ratio test for a=0 has a 
value 4.27 of the “deviance,” to be referred to a x? distribution. This 
suggests that the trend is somewhat more significant than does the binary 
division. It appears that the bottom 20 cm has a rather higher intensity 
than does the rest of the rock. 


Birds’ Nests 


Tubbs (1974), Newton et al. (1977), and Marquiss et al. (1978) analyze the 
spatial distribution of nest sites of buzzards, sparrowhawks, and ravens, all 
large birds of prey, the first two of which nest in tracts of woodland. All 
three studies worked with nearest-neighbor distances and had problems 
with the disconnected nature of the study area. Here we look at two 
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Fig. 8.10 ZL and p for crystal data and 100 simulations of hard disk model with disk diameter 
2.9 cm. 
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Fig. 8.11 Eagles’ nest sites in a square of side 80 units. (b) Dirichlet cells. (c) L. (d) Pp plus 
the envelopes of 100 binomial simulations. (e) and (f) L and p for data and 100 simulations 
of hard disk model diameter 10.5 units. 
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Fig. 8.12 Peregrines’ nests in a unit square. (a)-(f) as 8.11 except diameter 0.075. (g) and 
(A) L and p for data and 50 simulations of the model with pair function shown at (i). 
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(d) 
Fig. 8.12 (continued ) 
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Fig. 8.12 (continued ) 


moorland nesting species—golden eagles and peregrines. Figures 8.lla 
and 8.12a illustrate the positions of the nests of 14 and 20 pairs of birds. 
(Where a pair used a group of nests, the centroid has been chosen.) The 
scales and orientations are arbitrary. It is a matter of dispute as to how 
homogeneous the study areas are, and in particular whether we measure 
the birds’ preferences or the pattern of available nest sites. 

For the eagles even with only 14 nest sites, the Dirichlet cells, L, and p 
all clearly indicate a regular pattern. The minimum internest distance is 
beyond the 1% point for a Poisson process. Note that heterogeneity 
would tend to reduce the true area and so reduce d,,,._ The number of 
nests is too few to use the asymptotic theory to choose the disk diameter r 
as we did for the crystals. In fact, 7 was chosen by trial and error as 10.5, 
to give d,,, for the data about the median for the simulations. Figure 
8.1le, f shows L and fp for the simulations. This model inhibiting nest 
pairs closer than 10.5 units seems a good fit. If we think of a circular 
territory around the nest of this diameter, these territories cover only 19% 
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of the available area. The existence of inhibition between nests is biologi- 
cally somewhat surprising, as eagles hunt many miles from their nests and 
are not known to fight for territory. Perhaps a better explanation is that 
we detect the pattern of outcropping rocks, some of which are occupied as 
nest sites. 

The peregrines also exhibit obvious regularity, although L,, is only just 
beyond its 5% point. Figure 8.12d, e illustrates an experiment with a hard 
disk model with r=d,,,. This is not too satisfactory, the shape of L 
indicating further discouragement against establishing nests just beyond 
that distance. Figure 8.12: illustrates a more general pair potential func- 
tion Ah of (8.18) designed to take this into account. The corresponding 
simulations in Figure 8.122, h show a slightly better fit. The behavior of p 
in both these examples is puzzling, since it indicates a more regular pattern 
than that produced by the simulations that match the internest distance 
distribution. It is also surprising that it is difficult to fit such small 
patterns. Possibly there is some unexplained heterogeneity that affects p. 
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CHAPTER 9 


Image Analysis and 
Stereology 


Figure 1.3 illustrates a complex type of pattern which we will model not by 
a random surface or set of points, but by a random sef. Most of the 
applications are to three-dimensional bodies with two phases: 


Bone and tissue 

Blood cells and plasma 
Pores in sedimentary rocks 
Dislocations in metals 
Glass fibers in resin 


There are a few planar examples, often referred to as mosaics, such as the 
division of a region into different plant communities. 

Practical measurements can be made either on planar slices or linear 
probes through a three-dimensional specimen. Even planar mosaics have 
until recently only been analyzed by a line intercept. Stereology is the 
theory of recovering three-dimensional information from one- or two- 
dimensional samples. Chayes (1972) provides an entertaining account of 
the early development of the subject. There may be problems even in 
quantifying a planar section. Image analyzers are essentially machines to 
scan a section and to record information at each point of a very fine grid 
(typically 1024 x 1024, 1024=2'°). The information is usually a gray level 
or whether or not the density of the image exceeds a threshold. Clearly, 
even with a binary measurement a vast amount of information (128K 
bytes) is available that can be handled by a dedicated minicomputer or 
stored on magnetic tape and analyzed later by a large computer. With a 
dedicated machine there is another possibility; some analyzers are ad- 
dressable so the computer can ask for measurements at specific grid 
locations and so avoid the storage requirement. With the rapid decline in 
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the cost of small computers and especially of memory this high-technology 
approach may become more widely available. 

Random set theory was developed by Matheron (1967, 1975) to formal- 
ize the measurements which can be made by image analyzers and to 
provide some models for random sets. A more abstract version of the 
theory was developed independently by Kendall (1974). To date very 
little statistics (in the sense of the examination of and fitting of models to 
data) has been attempted. Section 9.1 sketches the probability theory, 
using a little topology. 

The twin subjects of geometrical probability and integral geometry are 
much involved in the sampling theory of linear and planar sections of three 
dimensional bodies. Sections 9.2~9.4 consider aspects of this problem. 
Again, statistical aspects have been inadequately explored, for geometrical 
probability is well suited to proving unbiasedness under randomization but 
not to finding even standard errors. 


9.1 RANDOM SET THEORY 


Let X denote the basic space, usually R? or R°, and consider a set A within 
X. For definiteness we will adopt the terminology of sedimentary rocks 
—A will be the conglomerate of grains and X\A the pore space. The 
most obvious way to specify a stochastic process Z generating A is by 
Z(x)=I1,(x), defined to be one if x A, zero otherwise. (J, is the indica- 
tor function of A.) Conversely, if Z is a function on X taking values zero 
or one, we can define the corresponding set A by 


A={x|Z(x)=1} 


The distribution of the stochastic process Z then specifies the probabilities 
of events of the form 


{Z(x)=1 WxEK, Z(x)=0 VxeEK’} 


={ADK, ANK'=2} (9.1) 


for finite sets K and K’. This was the theory proposed by Matheron 
(1967), but it is insufficiently rich. To see why we have to analyze in 
more detail the functions of an image analyzer. Recall that the output 
may be considered as /, evaluated at a very fine grid of points. 


RANDOM SET THEORY 193 


Basic operations: 


Minkowski addition A®B={x+y|xEA, yEB} 

Reflection B={—x|xeB} 

Subtraction AQB=(A°@BY = (x|(x— B) CA} 

Dilation of A by B A® =A@B={x|\(B+x)NA#D} 

Erosion of A by B Ay =AQB={x|(B+x)CA4} 

Opening of A by B AwB=(AQB)@®B= U{B+x|(B+x)CA} 
Closure of A by B A{B=(A@®B)OB=() (B+x|(B+x)NA#D} 


(See Figure 9.1.) : 

Clearly, AU B, ANB, A‘, and A are easily realized by logical and, or, and 
negation on the image analyzer output, and by rotating the array. A®B 
can be found from 


A@B= (J (A+y) (9.2) 
yes 


So we score one for A®B whenever a one occurs in any of the translated 
arrays A+y. Serra introduced this calculus for sets; his texture analyzer 
(see Klein and Serra, 1972) used a triangular grid which made calculation 
of A@B for a hexagon B easy; hexagons were used as approximations to 
circles. 

We will use “AN B*” sufficiently often to introduce the shorthand 
“At B” read “A hits B.” With our indicator-function theory we can find 
probabilities of events of the type 


{KCA, K'tA} (9.3) 


Clearly, a stationary point process is a random set A, but we would expect 
the probability of (9.3) to be zero. Since point processes are useful 
starting points from which to build random sets by Serra’s calculus, we 
need more general sets K and K’ in (9.3). To do so, we must restrict the 
class of random sets; we assume that A is closed. This amounts to 
allocating the grain-pore boundary surface to the grains. Our basic events 
to which we assign probabilities then become 


(Atl, i=l,....2, ANTy=@} (9.4) 


for 7,€9, a class of “test sets” that replace the points in (9.3). We will 
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assume that finite unions of test sets are again test sets. By the inclusion— 
exclusion formula we have 


P(ANT, =@,i=1,...,m, ATT;, j=m+1,..., 2) 


=>(-1)"!- "PUAN T=O VTEY) (9.5) 


the sum being over all subcollections y of (T7,,...,7,,) which include 
(T,,..-»T,). Thus we need to specify 


O(T)=P(ANT=@) VWTES 


So far we have avoided specifying 3. Christensen (1974) explored equiva- 
lences in the choices of § and Kendall (1974) showed certain properties 
were crucial. We have 


T, AT implies Q(T;) Q(T) (9.6) 
K, \K implies Q(K,) 70(K) (9.7) 
Q(G)=inf{Q(K)|KcG} (9.8) 
Q(K)=sup{Q(G)|GIK} (9.9) 


where test sets denoted G and K are open and compact respectively. We 
will start with J either a base for the topology of K, say all finite unions of 
open balls, or all compact sets. Then Christensen’s results show that 
Q(T) is defined via (9.6)—(9.9) precisely for those sets which are countable 
unions of compact sets. 


Theorem 9.1 (Choquet, Matheron, Kendall) 


Q is P(ANT=®) for some random closed set A if and only if 
(i) Q(@)=1 

(i) T, CT; implies Q(T,)> Q(T;) 

ii) 2(-—1I)M OCU (TT, Ey}) >0 summed over all finite subcol- 
lections 
y of (T;,...,7,), any finite subset of 5. 

(iv) If Dis a base 
Q(T )=inf{Q(G)|GCKCT for some compact K}. 
If 3 is the compact sets K, \K implies Q(K,,) 7 Q(K). 
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The reader may recognize from (9.5) that the expression in (iii) is just 
P(ATT;, i=1,...,2). It can be checked (Matheron, 1975, Section 1.5) that 
A®K, A, AOK, AwK, AfK are all random sets for a compact set K. 

In moving from finite sets to more general test sets we have lost 
something. For a single point set {x}, 


{{x} cA} = {{x} 14} 


so we could consider {K CA} for finite sets K. Now P( BCA) is defined 
for any subset B of X (since A is closed we need only check AD BND fora 
countable dense set D), but it may not be computable from Q. 

Our stochastic process will be labeled by J and give I( ATT) for each 
TES, Whereas every closed set A gives a family of indicator functions 
the converse is no longer true, the conditions of Theorem 9.1 being those 
necessary to ensure that probability one is given to those realizations that 
do correspond to closed sets. Similar problems arise in the corresponding 
theorem for point processes (Ripley, 1976b), but we avoided that theory by 
constructing all our point processes via the Poisson process. We can also 
avoid Theorem 9.1, for our examples of random sets are constructed from 
Poisson processes, 

Matheron (1967) defined a “Boolean scheme” as a generalization of a 
Poisson cluster process. Take a Poisson process with mean measure A 
and at each point form a disk or sphere with radius drawn independently 
from a specified distribution. The random closed set A is the union of the 
disks (or spheres). More generally we can replace the disk by an indepen- 
dent copy of a compact random set G, and let A= U(G, +é;), where ; are 
the points of the Poisson process and (G,) independent copies of G. 
Suppose we look within a compact set D. 


P((G, +£,)AT=Q|é, ED) = 7 Oq(T-£,) dA(E)/A(D) =Bp say 


P(ANTND=2) 


- P(no (G,+€,)tT for €; ED) 


= De Bp)"/nt=exp[ —A(D)(—- Bp) ] 


=exp— f-e(T-8)} d(&) 
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the approximation coming from discarding grains G,+£, with centers 
outside D. If we let D increase to X in a suitably regular way 


(7 =erp| ~ f (1-o(7-£)} dal (9.10) 


We would like to have P(TCA), but this seems impossible to compute in 
any usefulform. This heuristic derivation of Q(T) follows Watson (1975). 
Matheron (1975) gives a rigorous treatment and abstract characterizations 
of this and related processes. 

The “Boolean scheme” is an unrealistic model for a rock, for no 
interaction between the grains is assumed. Other models can be con- 
structed based on more general point processes, again avoiding Theorem 
9.1. An example is to take the models for the centers of nonoverlapping 
disks proposed in Section 8.4 and to use the disks as a random set. 

Another class of models is based on the random division of space. 
Pielou (1977, Chapter 12) describes two models for mosaics of vegetation. 
The cells of the division are filled independently with one of the possible 
types of vegetation. Models for the random division of space are surveyed 
by Miles (1972b); Pielou used two, the Dirichlet cells of a Poisson process 
and the tessellation created by a Poisson process of lines. The latter gives 
a simple sampling theory for a line intercept (Pielou, 1964, 1965; Switzer, 
1965). 

The models available are unsatisfactory and inferential techniques for 
them almost non-existent. Dupaé (1980) gives a method to unravel the 
size distribution and mean number of circular grains of variable radius in a 
Boolean scheme. 

Although we may not know how best to use them, estimates of quanti- 
ties such as P(TCA) are available for homogeneous random sets. An 
image analyzer views the random set A within a window D. Let v denote 
area in the plane. Then by counting points on the grid we can estimate 
very accurately »(CMD)/»(D), where C is a set created from A using 
Serra’s calculus. This suggests estimating P(TCA) by 


»({x|T+xCA,x€D})/v(D) (9.11) 


Unbiasedness of (9.11) follows by interchanging expectation and integra- 
tion over D. The estimator of (9.11) is not practicable since T+x might 
cross the boundary of D. To remove these edge problems we use a border 
(in the sense of Section 8.1) and allow x to vary in DOT. Thus our 
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unbiased estimator of P(TCA) is 


# {x|T+xCA,T+xcD} 
#{x|T+xcD} 


To estimate Q(T) we have 


# {x|T+xCA°,T+xcD} 


# (x|T+xcD} ene) 


This is the form used for 1—f(t) in Chapter 8 and is again unbiased. 


There is a general procedure here. We have formed two new random 
sets A’=AQT and A@®T and computed 


v(A'NE)/o(E) (9.13) 


for E=D@T, to estimate P(TCA) and 1—Q(T). Because A’ is also 
stationary 


E(v( A'NE)/»(E))= f PEA’) dx/n( E)= PEA’) 


which proves unbiasedness of (9.13) for P(QEA’) for any random set A’ 
that can be constructed within E from A/D using Serra’s operations. 
The variance of (9.13) is given by 
E(v( A’ NE) /o( EY) - PEAY 


=f {P(xEA’,yEA’)— P(KEA)P(y GA’))} dx dy/o( EY 
EXE 


={ _ cov(1,(x), Ly) dX dy/( EY (9.14) 
EXE 


Now cov(1,-(x), Z,(y)) is just the covariance function C(y—x) of our 
zero—one stochastic process Z abandoned earlier. However 


C(h) = P({0,h} CA)— P(OEAY? 


= 2({0,h})—O({0})” 


and so can be estimated itself from (9.12) and the estimate (possibly 
smoothed or fitted by a parametric model) used in (9.14). 
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9.2 BASIC QUANTITIES 


Suppose we have a compact convex body Y on which we wish to make 
measurements. We will work in d-dimensional space (d=2 or 3) although 
Y might be of lower dimensionality, for instance, a surface or curve in R®. 
Let p, denote k-dimensional content (count, length, area, or volume). 
Define 


W(Y)=n,(Y) 


WAY) =E{ug_,(I1s¥)} O0<k<d (9.15) 


where S is a uniformly distributed k-dimensional direction and II, denotes 
the projection of Y onto the (d—k) dimensional subspace determined by 
S. Then the Minkowski functionals are defined by 


WY) =b (Y)/ba_x O<k<d 


WIO(Y )=b, (9.16) 
where 
b, =7*/2/T(1+k/2) 
is the content of the unit ball in R*. For a ball Y of radius R 


WLO(Y) =b,RO* (9.17) 


These quantities can be expressed in simple cases in more familiar terms. 
F(Y)=dW{“(Y) is the surface area (or perimeter length) of Y. M(Y)= 
dW§®(Y) is the average over all orientations of the caliper diameter in that 
direction, the distance between two parallel planes (or lines) touching the 
body. If the body has a smooth enough surface 0Y to define curvatures 
then there are (d— 1) principal curvatures k,,...,«,_,- (The surface can 
be thought of as locally quadratic in a coordinate system with origin at the 
point of consideration. The x; are the eigenvalues of this quadratic form.) 
In R* «, is the inverse of the (local) radius of curvature, whereas in 
R?> «x, and «, are the inverse of the radii of curvature of the most curved 
and least curved lines on the surface through the point being considered. 
M=dW is the integral over the surface of the average of these curva- 
tures (Miles, 1975; Santal6, 1976, p. 222). Thus we have in the practical 
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cases 

R' W(Y)=L(Y) length 
W,(Y)= P(0Y)=2 number of end points 

R? W(Y)=A(Y) area 
F=N=2W,(Y)=L(0Y) perimeter length 
M=2W,(Y)=C(0Y)=27 total curvature 

R? W(Y)=V(Y) volume 
F=3W(Y)=S(0Y) surface area 


N=M=3W,(Y)=K(aY)=! f (x, +) integral mean curvature 
eY 


3W,(Y)=G(aY)= f ik =A Gaussian curvature 


Minkowski functionals are important because W{? is (d—k) “dimen- 
sional,” they are invariant under rigid motions, increasing, continuous (for 
a suitable topology on compact sets) and additive in the sense 


WAN Y)+W( XU Y)=H(X)+H(¥) (9.18) 


whenever X,Y, XU Y are convex. Furthermore, any functional with these 
properties is a linear combination of Minkowski functionals (Hadwiger, 
1957, pp. 221-225). 

Clearly the intuitive interpretations can be extended to nonconvex 
bodies. The additivity allows us to extend the definitions to finite unions 
of disjoint convex sets, and in fact we can go to ovoids, arbitrary finite 
unions of compact convex sets, by defining 


W(Y)= > W(Y,)- SW NY,)- +(-)" WYN NY,) 
i i<j 


(9.19) 


where Y=Y,m--- 1 Y,, is any decomposition of an ovoid into compact 
convex sets. The intuitive interpretations hold true for many compact 
sets, C(dY) being 27 times the number of bodies minus the number of 
holes and G(dY) 47 times the number of bodies and holes. Figure 9.2 
gives an intuitive interpretation. Note that the original definition (9.15) 
holds only for convex bodies. 
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Fig. 9.2 The body shown can be divided into four convex 
bodies A, B,C, D with the double boundaries being the 
overlap. From (9.19) the total curvature is 42a for 
A, B,C,D minus 4X27 for the four dashed convex bodies, 
hence zero. 


Image Analyzer Measurements 


Watson (1975) shows how an image analyzer may be used to compute 
Minkowski functionals of a planar set Y within D. Obviously we can find 
the area A(YMD) by counting ones in the binary representation, and we 
will relate all the other measurements to areas of sets transformed by the 
operations of Serra’s calculus. If Y has a smooth boundary and B is a 
ball of small enough radius r 


A(Y®B)=A(¥)+rL(dY) + $r°C(aY) (9.20) 


(this is intuitive and is proved by Miles, 1974b, Theorem 3). So if we 
measure A(Y@B) for different radii and plot this against radius we can 
estimate L(0Y) and C(dY) by fitting a quadratic to the curve. Forming 
Y@B involves translating by all points in B, so can be expensive. Sup- 
pose we just use T={0,h}. Let K(h)=A(YOT)=A(¥N(Y—h)). If h= 
ra for a direction a, scalar r>0 then 


b,=-|2K(ra)|| =3 f In-aias (9.21) 


where n is the outward normal at the surface. (A few sketches will 
convince you that 2K(0)—2K(ra)=r/f,,inf(0,n-a)ds. {,,n-ads=0 be- 
cause the boundary is one or more closed curves). The analyzer can 
easily form an approximation to D,. However, from (9.21) 


ave D, = L(9A){ 3 ave|n-a|} =L(84)/a (9.22) 


so by averaging over directions we can estimate the perimeter length, and 
if y is the line segment from 0 to rB, B perpendicular to a, 


D,(Y®y)—D,(Y)=rC(aY) (9.23) 
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This measurement is simple with a rectangular lattice binary representation 
of Y. 


Counting objects 


It is far more difficult than appears at first sight to devise an unbiased 
method for counting the number of objects within a window D. What 
should the observer count for objects that meet the boundary of D? The 
obvious suggestion to estimate the proportion of the object within D only 
works if the observer can see beyond the window, and is time consuming. 

The net tangent count illustrated in Figure 9.3 provides a simple way to 
manually measure C(0Y) and hence the number of bodies if these have no 
holes. It works well even if we can only observe dY(MD, for if a large 
domain is partitioned into subregions, the sum of the counts in the 
subregions will be the total count, so there is no systematic edge-effect 
bias. DeHoff (1978) gives an expository account of the tangent count 
method. An image analyzer can recognize tangent counts as shown in 
Figure 9.3. 


° ° oO ° fe) ° | e) ° | 


1°) | | | | ro) | | | | 
(b) 
Fig. 9.3 (a) Illustration of the tangent count. Locate horizontal edges and score the four 


cases as shown. (b) Image analyser patterns (of arbitrary width) corresponding to tangent 
detection. 


BASIC QUANTITIES 203 


Figure 9.4 illustrates the Associated Point and tiling methods. The AP 
method defines a unique point on each object and includes an object in the 
count when its associated point is within the window. An AP can be 
defined automatically by the illustrated “lower left” rule. This is precisely 
the tiling method of Gundersen (1977) for rectangular tiles. Miles (1978a) 
states that at least one image analyzer can find associated points as a 
standard function. It is not possible to find an AP from that part of an 
object within D. 

Miles (1974b) describes two methods based on weighting objects. For 
“minus sampling” he considers only those objects totally contained in D. 
The idea is similar to that used for K in Section 8.3. For each object O, 
contained in D compute a measurement T, and 


M, =area{x|x +O, CD} =area( DCO, ) 

so M, measures the freedom of O, to move in D. Then 
E(3T,/M,)=pP(M>O0)E(T|M>0) (9.24) 
where p is the number of objects per unit area, and the conditioning arises 


because large objects will never be measured. For counting 7;=1, so 
=1/M, is an unbiased estimator of the intensity of objects that will fit 


AP 


(a) 


Fig. 9.4 (a) Definition of an associated point by the “lower left” rule. (6) One of 
Gundersen’s tiling patterns. Objects are counted which have some point within the rectangle 
and do not meet the solid line. 
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within D. Plus sampling is similar; all objects that meet D are included in 
the summation, and the weight is M, =area(D@®O,). Miles computes M 
for various objects within a circular domain D. This method is only 
practical for image analyzers. Even these have to recognize distinct 
objects before computing D@O, or D@O,. 

Gundersen (1978) describes an elaborate tangent and edge count method 
of Giger (1967) and Weibel (1973). 


9.3 STEREOLOGICAL SAMPLING 


The purpose of this section is to investigate estimators of V(Y), S(OY), 
K(0Y), and G(dY) for a phase Y embedded in a specimen X; we assume X 
to be convex. We assume that the content of X can be measured exactly. 
We denote by 7 and L a planar section and a linear probe respectively and 
allow only measurements in the section or the line. Then the available 
estimators are given in Table 9.1. Some of these formulas go back to 
Delesse (1848). The ratio form of these estimates is intuitively sensible 
and will be justified by random sampling. To be specific we mainly 
consider planar sampling of a three-dimensional body. 

There is no such object as a uniformly distributed plane in R°, but there 
is an invariant measure for planes. We denote by H(X) the measure of 
the set of planes which hit X¥. We can construct a plane (called an JUR 
plane, for isotropic uniform random) conditional on hitting X with proba- 
bility proportional to this measure by taking a ball containing X, and 
generating an independent uniformly distributed normal ¢ and distance p 


Table 9.1 Stereological Relationships 


Dimension 
Of x Of Section Of Y Estimate Of 

3 2 3 A(YNT)/A(XN T) VIY)/V(X) 

3 2 2 LIYNT)/A(XNT) (7/4)S(Y)/ V(X) 
3 2 2 CIYNT)/A(XNT) K(Y)/V(X) 

3 2 1 P(YNT)/A(XNT) 4L(Y)/V(X) 

3 1 3 LIYNL)/LUXNL) V(Y)/V(X) 

3 1 2 P(YNL)/L(XNL) 3S(Y)/V(X) 

2 1 2 L(YNL)/L(XNL) A(Y)/A(X) 

2 1 1 P(YNL)/L(XNL) (2/a7)L(Y)/A(X) 
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from the center of the ball. These determine an JUR plane conditional on 
hitting the ball. If we reject those realizations which miss X we have an 
IUR plane T through X. Then 


E{A(YNT)} =V(¥)/H(X) 
E{L(YnT)} =(2/4)S(Y)/H(X) (9.25) 
E(C(¥NT)} =K(Y)/H(X) 
E{P(YNT)}=3L(Y)/H(X) 

From the description of the JUR plane it is clear that a typical left hand 


side is, for convex Y, 


z{ f WYN s,) ds} /H(X) (9.26) 


where the expectation is over a uniform direction S and S, is the plane with 
normal S and distance s from the origin of the ball. Now (9.26) is an 
increasing additive functional invariant under rigid motions and of “di- 
mension” 3~—k, so by Hadwiger’s theorem it is a multiple of WO(Y). By 
taking Y as a unit ball, we can find the multiple. Extending the argument 
to e-dimensional sections of d-dimensional bodies we have 


BQH AT)) = (WEOD/HOD} | 


(9.27) 
This holds for convex Y; we can extend by additivity to ovoids. (This 
argument follows Matheron, 1975, p. 82. An alternative derivation avoid- 
ing Hadwiger’s theorem is given by Miles and Davy, 1976; Davy and 
Miles, 1977.) 

From (9.25) and (9.27) we have that all the results of Table 9.1 satisfy 
E(A)/E(B)=C when 4/B estimates C. However, the expectation of the 
ratio is not the ratio of the expectations. If m sections are taken 
(A, +°+°: +4,,)/(B, +:++ +B,,) will be a reasonably unbiased estimator, 
but the point of using ratios is that A, and B, will be highly correlated. 
The standard remedy is weighted samples, introduced in this context by 
Miles and Davy (1976), Davy and Miles (1977) and Miles (1978b). If the 
probability density of a section is proportional to B, then 


Ex 4/B)= { (5) ey Pn E/E) (9.28) 
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Thus the estimators of Table 9.1 will be unbiased if we weight planes by 
A(XMT) and lines by L(X¥ML). Clearly, these are sensible weightings. 
In fact, generating weighted planes is as easy as generating JUR planes. 
Choose a random point within X and an independent random direction, 
then use the plane through the point perpendicular to the chosen direction. 
To find a length-weighted line, choose an area-weighted plane, take a 
random point in this plane and a line with an independent uniform 
direction through that point. Davy and Miles (1977) prove formally that 
these algorithms do yield the desired distributions of planes and lines. 

Note that these results apply only to the randomization used in sam- 
pling, and not to any inference from the specimen X to its population. 
Furthermore, no variance results are given, although Miles and Davy 
(1976) do repeat a standard argument suggesting that weighted samples 
will have a smail variance. In practice, random sampling is probably 
impossible. Unbiasedness can be proven under other conditions. If the 
specimen X is a cuboid it could be sectioned at a random height parallel to 
one of the faces, so A(X MT) is constant. However, we cannot use (9.25) 
(except the first formula), since sampling is not isotropic, unless we assume 
that the specimen was chosen isotropically from a population. Again, the 
estimators of Table 9.1 will be unbiased. These assumptions seem closer 
to the procedures used in practice. Serial sections are commonly taken 
from a cuboid. Estimates from each are then unbiased but correlated. 
So the elegant mathematics that results from random sampling is unsatis- 
factory both statistically and in practice. 

There are two important omissions from Table 9.1. We have no 
estimators of G(dY ) from planar sections, nor of C(dY) from linear probes 
of planar objects. These are important, for they are essentially counts of 
the number of components of the Y phase. Miles and Davy (1978) have 
found an unbiased estimator of G(dY) from measurements at the edge of a 
wedge-shaped section, but this does not seem practical. 

Practical problems that need further attention include the use of serial 
sections and measurements other than the basic ones derived from 
Minkowski functionals. The oil industry is interested in the topology and 
many other properties of the pore space related to fluid flow, for example. 
The new science of X-ray tomography (see Shepp and Kruskal, 1978 for a 
mathematical report) may allow a three-dimensional digitized image to be 
produced that allows a much greater selection of measurements and 
eliminates the stereology completely! 


9.4 SIZE DISTRIBUTIONS 


Determining the grain sizes in a sedimentary rock is an important part of 
the description of the rock. In practice, these are measured by comparing 
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a planar section with a set of standard charts whose number of grains per 
unit area is known. Specifically, the ASTM grain number is g, where 
50 x 28 is the number of grains per square inch, and charts are available for 
small integers g. Clearly, at this level of precision it is not necessary to 
define too carefully what is being measured. There are two problems in a 
more formal treatment. One is to measure the apparent grain sizes in a 
planar sample. This is considered under the heading of granulometry. 
The other is to infer the true distribution of grain sizes from that observed 
in a planar sample, known as unfolding. We ignore the problems of 
breaking grains during preparation of the sample, which can be either by 
cracking the rock or polishing a section, but do consider that the smallest 
grains might fail to be observed or be removed during the polishing 
process and the pits left go undetected. 


Granulometry 


Suppose we have a planar image of disjoint grains and pore space, to be 
analyzed by an image analyzer. Let A be the set of grains. Matheron 
(1967, 1975) abstracted the notion of sieving grains. He considered a 
family y, of transformations on A, where (A) will be those grains too 
large to pass through the sieve of size A. Clearly we would demand 


1. ,(A)CA 

2. A>w => ¥(A)CY,(A) 

3. 4,(Y9(A)) = Yananca, (A) 

4. BCCCA>y,(B)cy,(C) 

5. Equivariance under rigid motions T, so (TA) =Ty)( A) 


Fortunately we can achieve these without identifying the grains them- 
selves. Consider ¥,(A)=AwB,, where w is the opening defined in Section 
9.1. We need B, to increase with A and B,=B,wB, forA>p. Typically, 
B, is a ball of diameter A, in which case y,(.A) contains contributions only 
from those grain images with diameter (the greatest distance between two 
points in the grain) exceeding A. Then a grain size distribution function 
can be defined by 


G(A)=1—area(y,(A)) /area( A) (9.29) 


If the grain images are circular G is just the empirical distribution function 
of the image diameters. If the image is viewed through a window, some 
edge images will be noncircular and a slight bias will be introduced; in 
particular, G(A) will no longer be a step function. 

It is quicker for both automatic and manual measurements to take a 
linear granulometry in which B, is a line segment of length A. For an 
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image analyzer, this would be taken to be in the direction of the scan. 
Suppose A is scanned in a series of parallel traverses. Then the total 
traverse length across grains is proportional to area( A), and area( AwB,) is 
proportional to the sum of all traverse lengths across grains which exceed 
A. Thus G(A) is calculated from 


G(A) = ZT MT, >A)/2T, 


for grain traverse lengths (7;). The larger grains will be measured on 
several traverses, so G(A) does not estimate the grain diameter distribution 


F, but 
if *sdF(s) / i) ” sdF(s) 


Of course, F can be recovered from this weighted distribution. 


Unfolding 


Unfolding is the term used to describe the derivation of the distribution F 
(possibly with density f) of the particle sizes in R? from the density g or 
distribution function G of their observed sizes in a planar section. To be 
specific, we will consider spherical particles the sizes of which are mea- 
sured by their diameters. This problem was posed and solved analytically 
by Wicksell (1925). Suppose that the centers of the spheres are from a 
Poisson process of intensity A, low enough for the spheres not to overlap. 
Then if m is the first moment of F, 


a(v=y f(x? —y?)"'? dF(x)/m (9.30) 


A random JUR sectioning plane will give the same formula for any 
collection of spheres in a compact body. It follows from two facts: 
spheres are cut with probabilities proportional to their diameters, and 
conditional on being cut the cut is uniformly distributed across the sphere. 
Thus, P(observed diameter >y|sphere of diameter x cut)=p(y, x)= 
V(x? —y?)/x and 


a(v)= fo - SPU x)xdF(x)/m 
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The inversion of this Abel-type integral equation is straightforward. 


io 2 


(u2 —x?)'”? 
aoe ee Prog ay 1/27 2 2-1/2 
aaa f Kc wy) (9? =x?) ydy | dF(v) 
and the term in { } is a constant, in fact 7/2. Thus 


1 Fx)= A fu? x8)" aG(u) (9.31) 


f(x)=- ae Cs = xt) PE (ue ig(u)) du: (9.32) 


The constant m is the mean of F and so unknown; it is found from F(0)=0 
or fo° f(x) dx=1. We can relate the (noncentral) moments of F and G by 


Va T{3(k+2)} Bal F) 


1,(G)=—— (9.33) 
2 r{i(x+3)}™ 

Nicholson (1970) noted that we may not be able to observe small cross 
sections and proposed a truncation point yo for the diameters of the circles; 
smaller circles are ignored. Then (9.31) and (9.32) hold for x > yp. How- 
ever, there is now no information on F(y) —), the number of spheres of 
diameter less than yy and so m is unknown. We must use the conditional 
distribution for diameters not less than yo. 

A slightly different experimental setup is to view thin slices of particles 
embedded in a translucent material. Then what is observed is not the 
diameter of a cross section, but the maximum diameter within a slice of 
thickness ¢. We can amend (9.30) to 


r= ef AMaeray] — 038 


The inversion of (9.34) is more complicated, particularly if a truncation 
point is used, but has been considered by Bach (1967) and others. 

The estimation of m is important, for we would expect to see AmA 
circles in a section of area. A. The estimation of A is often the main aim of 
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the investigation, so we need an estimate m of m to find 
A=number of circles //iA (9.35) 


Note that with no assumption on the particle shape we failed to estimate 
G(98Y)/4z, the number of particles, in Section 9.3. From (9.33) 


m=n/2f u-'dG(u) (9.36) 


It should be clear from (9.35) and (9.36) that small diameters are important 
in finding m and A, yet these may be observed or measured or counted 
inaccurately. Suppose the observed circle diameters are y,,...,¥,. To 
use (9.31) or (9.36) we have to estimate G. The obvious choice is to take 


the empirical distribution function G, of {y,..., y,} and estimate m and F 
by 
m=n/{221/y;} (9.37) 
A(x)=1-{ > (F-x""\/ea/7) (9.38) 
Yirx 


Watson (1971b) investigated these estimators. F is not monotone, de- 
creasing to minus infinity at each y,, and both estimators have infinite 
variance. The estimator m is asymptotically Normal, but at a rate that 
depends on the behavior of F(x) for small x. Simulations showed the 
estimates to be useless. 

The usual remedy for such behavior is to smooth G,, either nonparamet- 
tically or by fitting a parametric family for F or G. Anderssen and 
Jakeman (1975a, b) (using results from Jakeman and Anderssen, 1975) 
showed that the numerical inversion of (9.31) and (9.32) is unstable, as 
might be expected from Watson’s results. Thus if a parametric family is 
used for G it is important that the inversion be performed analytically. 
Anderssen and Jakeman recommend using (9.31) with product integration, 
which is a means of interpolating G, before integrating. They show this 
gives a monotone, consistent estimate of F (as noo). If the density f is 
required they recommend spectral differentiation of this estimate, which is 
again a smoothing technique. 

Keiding et al. (1972) parametrized F as the distribution of the square 
root of a gamma variable with integer shape parameter, for which g can be 
found analytically from (9.30). They found maximum likelihood esti- 
mates numerically. In fact, they extended the parametric family by 
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considering mixtures with three different scale factors, and assuming that 
glancing cuts on particles were missed. It was assumed that only cuts 


with y as shown in Figure 9.5 greater than were recorded. Then (9.30) 
becomes 


NEES, 


g(y)= 22)" '/? GF(x) 


oral 


They estimated a shape parameter, a scale parameter, ¢, and the propor- 
tions in each of the three populations. Bimodal distributions F occur in 
practice so it seems important to take mixtures. Anderssen and Jakeman 
(1975b) give an example in which a restricted form of this parametric 
technique (¢=0, no mixtures) performs less well than does their own 
nonparametric scheme. 

A parametric technique has advantages in the final stages of the analysis 
but some nonparametric estimate seems to be needed to choose the 
number of mixture components and give good guesses for the parameters 
to start the numerical maximization of the likelihood. 

Surely the commonest way to smooth an empirical distribution function 
is to use a histogram. Often the observed diameters will have been 
grouped during the measuring process. We can regard (9.30) as a triangu- 
lar system of linear equations between histogram estimates. If the bins 


are labeled | to A with midpoints a,, upper limits b, and widths w, then we 
approximate (9.30) by 


w,8(a,)=a; > wf(a,)/{m(b?—a?)'7}. (9.39) 


jai 


Fig. 9.5 Definition of the upper limit ¢ of the angle of incidence. 
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The upper limit 5, is used to avoid singularities. (Anderssen and Jakeman, 
1975b discuss other methods in their Appendixes.) The procedure of 
Saltykov (1967) is essentially the solution of (9.39) by back substitution. 
All the observations in the largest size bin are assumed to come from 
spheres of diameter b,; f(a,) is then computed to give the correct number 
in the largest bin and the contributions from those spheres to all the 
smaller diameter bins subtracted from the counts in those bins. This 
process is repeated for the second largest bin and so on. Eventually, the 
contents of some (small diameter) bin wili become negative, and the 
densities for this bin and all smaller diameters are set to zero. Greeley 
and Crapo (1978) investigated the accuracy of this method for m by a 
simulation experiment. For 100-400 observed circles they recommended 
10 bins as a compromise between accuracy and sampling variability. The 
method is simple and copes well with unobserved small circles. It could 
be developed by introducing criteria for amalgamating or splitting bins to 
give a good manual method of inversion. If the fine structure of f is 
important then Anderssen and Jakeman’s method would be preferred. 

So far we have only considered “size” as the diameter of a spherical 
particle. Wicksell (1926) considered ellipsoidal particles. Cruz Orive 
(1976, 1978) showed that one can unfold the distribution of a two-parameter 
family of ellipsoids of revolution but not the three-parameter family of 
general ellipsoids. The device originally suggested by Wicksell of regard- 
ing near-spherical particles and their cross sections as spheres and circles 
of the same volume and area is supported by the numerical experiments of 
Anderssen and Jakeman (1975) and Greeley and Crapo (1978). It seems 
sufficiently accurate in practice when the ratio of the longest axis to the 
shortest axis of an ellipsoidal particle does not exceed 2. 

It is also possible to unfold the distribution of the length of line 
intercepts on a line probe. This is even more unstable, and the results of 
Nicholson (1970) strongly suggest that it should not be used whenever 
planar sectioning is possible. 

A similar unfolding problem arises when measuring the distribution of 
membrane thickness, considered by Gundersen et al. (1978) and Jensen et 
al. (1979). 
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